How to gracefully remove host from NSX-T based Supervisor Cluster?

Recently got this question couple of times on how to gracefully remove host from existing Supervisor cluster, which is configured with NSX-T. I thought it is worth to write a quick post with detailed steps. Removal of the host from existing Supervisor cluster can be for multiple reasons, it can be for some maintenance or adding host into other supervisor cluster etc.

If you are still not aware of what is this Supervisor cluster, I would highly recommend you read this blog post.

Below steps are written assuming you have NSX-T based Supervisor cluster configured. In NSX-T based supervisor cluster, ESXi hosts are also working as kubernetes worker nodes, hence it is important they are properly removed from cluster. In new 70 U1 capability vSphere with Tanzu with vSphere (VDS) networking, ESXi hosts are no more worker nodes as there it is all about Tanzu kubernetes clusters (aka Guest clusters), hence removing host from VDS based Supervisor cluster is same as earlier.

Steps :

  1. Identify the host you would like to remove.
  2. Put the host in maintenance mode
  3. Since host is now going into maintenance mode, all the vSphere pods running on this node/host will be re-created on the other available hosts (DRS will take care of recommending proper host).
  4. VMs running on this host such as Supervisor control plane VMs or Guest Cluster VMs or any other normal VMs will be migrated to other available hosts as usual.
  5. Once host goes into maintenance mode, spherelet service on the host will be stopped but spherelet vib pushed to ESXi host will be still there. you can check spherelet vib and service status using below commands. SSH to host for running below commands.
  6. “esxcli software vib list | grep spherelet” & “/etc/init.d/spherelet status”
  7. When host goes into maintenance mode, in kubernetes term, it is called node is drained. i.e. this host is no more ready for taking any k8s workloads. When you run “kubectl get nodes”, it should be shown as “not ready”.
  8. NSX-T host transport node will be still as it is in configured state. You can confirm it from NSX-T UI or you can also check “nsx” vibs on host in maintenance mode.
  9. Now instead of removing this directly from inventory, move this host as standalone host into datacenter.
  10. As soon as you move this host as standalone host, spherelet vib on the host will be removed as well i.e. no spherelet service will be on it as well.
  11. Also, NSX-T host transport node will be un-configured as well automatically. i.e. All NSX-T vibs are removed at this stage. You can check it from NSX-T UI, it should be shown as “Not Configured”. This happens because while configuring NSX-T, we had applied host transport node profile on vSphere cluster level.
  12. “kubectl get nodes” or H5C UI (Cluster >> Monitor >> Namespaces >> Overview) should not show this host as worker node.
  13. Only reference now pending is, this host is still part of Distributed switch (DVS/VDS) you configured as part of Supervisor Cluster. You can remove it from usual DVS UI workflow.
  14. At this stage, this host is completely free to go for maintenance or add into another Supervisor cluster. You can remove it now from vCenter server inventory as needed.
  15. To add host back into same Supervisor cluster again, it is better to have host in maintenance mode, make sure you add this host back to DVS/VDS configured. Move this host into cluster, exit from maintenance mode and it will automatically install spherelet and NSX-T vibs and after few min, it will be ready for taking k8s workloads.

Further reading:
1. vSphere with Kubernetes (now vSphere with Tanzu) 101 is here
2. Official documentation for vSphere with Kubernetes is here
3. Automation around Supervisor cluster
4. Read about new 70 U1 capability vSphere with Tanzu with vSphere (VDS) networking
5. Automating supervisor cluster workflows using Java