I was asked recently what would be a good practice when it comes to performing maintenance on the vSphere layer in a VDI environment. This should be a useful post to share.
I look at this from 2 angles, the update strategy and the actual execution.
Strategically, I always start with a conservative approach; then from there decide if everything applies or not in the target environment.
Virtual desktops are mostly like any virtual machines in a vSphere environment; you should be able to vMotion them among hosts. Exceptions would be virtual desktops which are tied to a physical hardware, such as one with vDGA.
Now let’s consider the most common situation where the virtual desktops are free to move about. Considering using vMotion, Virtual Desktops are no different from any virtual machines; all the rules of what can be moved applies. This allows us to plan according to vSphere maintenance best practices.
Strategic Considerations
- Have a separate Test/UAT VDI environment. This should be much smaller in scale, but with hardware as similar to the production as possible. Reason being that for any changes you are going to make to the production environment, you can test the updates as well as the procedure of the update. It will not be pretty if a gung ho update to the production goes south.
- Avoid having multiple changes at the same time – such as keep infrastructure changes separate from virtual desktop changes. That last thing you want is to make too many changes, end up with an issue which you have no idea on the actual cause. Sensible change management practice is a must.
- Major patches or application changes to virtual desktops should be well documented and tested. Usually, only functional tests are performed. In a virtual desktop environment which typically has very high consolidation ratios (compared to virtual servers), any change which results in increase resource demand by the virtual desktop can quite likely result in performance issues very quickly. I recommend to do a resource utilisation measurement as well.
Tactical Considerations
- Definitely have at least 1 host spare capacity (N+1) in the cluster so that one host can be taken out for maintenance. Clusters hosting mission critical desktops should definitely have at least be N+2 so that even during a single host maintenance, there is still one host capacity available to handle any unplanned failure with the remaining hosts.
- Leverage multi-NIC vMotion; ESXi hosts for desktops typically have at least 128GB of RAM. Transferring that much of data will take time, and the sooner it can finish the better. Also consider the time needed to rebalance the cluster once the host is back from maintenance. Larger hosts should consider having 10Gbps NICs.
Total time needed per host = pre-maintenance evacuation time + maintenance time + post-maintenance rebalancing
- Test to ensure that the applications running in the virtual desktops are not sensitive to vMotion. This is just like for virtual servers; some applications are very network sensitive and are not able to tolerate the network switch over when a VM cuts across hosts during vMotion. This test should also apply to virtual desktops. Applications that has high network transfer or are latency sensitive are indications for such tests.
Take Note
- These apply to all types of virtual desktops, be it for them to be persistent, floating, full clones or linked clones. The type of virtual desktops they are does not really change the nature of being virtual machines.
- Again, these do not apply to virtual desktops which are tied to specific hosts, e.g. vDGA is in use.
- Storage vMotion is not related to this case on vSphere maintenance, and it is a whole different topic. Quickly I should mention that Storage vMotion are not supported with linked clones.