, , , , ,

Troubleshooting vSphere Cluster Service (vCLS) with Retreat Mode

If you’ve ever run into an issue with VMware Cluster Services (aka vCLS), you’ll notice when there is an issue and the cluster is reporting as degraded or unhealthy, there’s a message about cluster health like this:

vSphere Cluster Service VM is required to maintain the health of vSphere Cluster Services. Power state and resource of this VM is managed by vSphere Cluster Services.

It is not advised to manually manage (power on/off or delete) those VMs…so to resolve that issue, you’ll need to put the cluster in what’s called “retreat mode”, which will then power down and remove all the vCLS VM’s. After that completes, taking the cluster out of retreat mode will trigger vCLS to re-deploy the vCLS VM’s, power them on…and all should be good again. 

In previous versions of vSphere, putting a cluster in retreat mode was a bit daunting and was a manual process. See this link for the process of putting a vCLS cluster in retreat mode.

Starting in vSphere 7.0 U3o and 8.0 U2, entering retreat mode is now available as a Cluster setting within the vCenter Server UI, which makes the process easier and takes less time to resolve cluster degradation issues. Read this for more info on the newer process for retreat mode.

As seen in the previous picture above…I seem to have a cluster health issue. Let’s try to resolve that with retreat mode.

To get there, I’m selecting my troubled cluster, and then going to Configure -> vSphere Cluster Services and General. then click on the Edit vCLS mode button on the right.

A new window opens and we can select Retreat Mode

When selecting retreat mode you’ll be presented with a disclaimer that the mode should be used with extra caution…because DRS and HA mode will not function while in retreat mode.

Makes sense, and it’s nice to see the information and be reminded.

A banner will also be displayed while the cluster is in retreat mode:

Within a few moments the vCLS VM’s will be shut down and removed from the cluster

And they are gone…

Now we’ll simply do the reverse, and vCLS Service will begin to deploy and power on new vCLS VM’s. Back to System Managed…

The vCLS VM was recreated for the cluster – notice there is only one VM now.

And the health status of the cluster is now reporting as Healthy.

A nice view of the health history is shown here as well.

And that’s all for this one! If you learned something or I helped you out in some way, please make a comment and let me know. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *