Friday, January 22, 2010

Update an ESX server with VUM

Vmware update manager (VUM)

I've recently worked on VUM for one of my clients and I thought of sharing experiences here as it might be useful to people out there who may work in near future....

Let me describe the setup on which i have tried VUM. My client runs 36 ESX servers (OS ver-3.5 update 4) over 4 clusters and 1 Virtual Center 2.5 with a licence to implement HA and DRS on all 4 clusters. VUM 1.0u2 version is installed on VC server and a plugin is installed to access from VI client. This should give you an idea of our setup.

Now let me tell a small story of why we wanted to update our esx server to 3.5 update5. Sometime back our esx servers frequently started reporting HA errors and we also found there were process threads stuck in ESX server that were eating all service console memory. To resolve this issue we increased our service console memory but that too didnt help us :( .
So we contacted VMware support and their advice was to upgrade to update5 which should solve this type of HA errors. (End of story ;)

VUM:

Regarding VUM, its a nice tool for admins like me who are lazy to sit and do all steps like putting ESX in maintenance mode, install updates, reboot etc. The GUI makes the VUM easy to use because once VUM is installed an additional tab is added to the ESX server in VI client and also you see option like "scan for updates and remediate" if you right click on any ESX server. The update manager gives information whether the ESX server is compliant or not.
What VUM does while remediating an ESX host?
It migrates the VMs in an ESX host, puts it in maintenance mode, installs updates, reboots host, brings out of maintenance mode and re-checks for compliance. All these steps are done during successfull remediation.

We created a ESX_upd5 baseline and added the update5 that was downloaded by VUM from vmware site. Next we attached the created ESX_upd5 to the cluster


Now proceeded to remediate hosts in this prod cluster which were non-compliant with ESX_upd5 baseline. Its simple just right click on ESX server and select remediate task, a window showing the attached baseline and the updates shows up click finish and remediation of host begins.
But there are few things that i've noticed while using VUM which i feel could make the usage a pleasant experience.
  • Make sure you have configured the baseline with exact updates that you want to install because its very critical and most important task while configuring baselines.
  • For VUM to migrate VMs before putting the host in maintenance mode the DRS on cluster should be set to "fully automated" or else VMs will not be migrated.
  • When VUM is migrating a VM it locks all possible options you can use like disable "edit settings ", "install vmware tools" etc. Only possible things you can do are poweroff, migrate and storage migration.
  • When VMs are not being migrated using VUM you could try to manually migrate VMs by resolving vMotion failure issues. If that is also not possible then we need cancel remediation task solve the issue then restart remediation task.
  • Dont forget to check the P2V'ed VMs where cdroms, serial ports are left enabled by mistake. They are the VMs that can waste ur time...
These issues may look silly but they can seriously extend the maintenance time (which I hate ). I mean imagine this .. you scheduled remediation task on host and next morning when you come the task is still progress just because one VM didnt migrate....
Dont worry there is an alternative to this problem, In VUM configuration tab there is an setting called ESX host settings->failure response..
It has 4 options :

fail task: if VM migration fails VUM will fail the task and report error in VC logs
retry: if VM migration fails, VUM will retry after mentioned time.
Poweroff and retry: If VM migration fails, VUM will power off the VM and retry migration
suspend and retry: If VM migration fails, VUM will suspend the VM and retry

3rd option can help in solving the problem but I wouldnt do that on production cluster but 4th option is a safe bet to use if you have approval for downtime.

This concludes my narration of date with VUM. Till now i have updated 2 of 4 clusters and will be doing the remaining soon. If any new issues happen i shall update this post.
Happy virtualizing....