I sure have done a lot of blogging about how to power off a VM lately, don’t you think?
One more time, but this time it’s the best version of them all.
I’m Eventing SO HARD right now
I figured this out while investigating something related to VSA and connecting the dots. Here’s how it went down.
In my setup, we have the VSA Clustering Service running on a separate VM at the remote location to handle quorum. One problem that pops up about once a month is the ongoing struggle of Windows patching. The VMs patch, and reboot as you would expect. This throws an alarm on the vCenter Datacenter object that the cluster service is offline. The other issue is that it doesn’t clear itself once the box comes back up. I wanted to find a way to clear this automatically, so that my monitoring guys didn’t lose their minds when hundreds of these appeared at 2AM.
If you look at the settings for the Alarm in question, you’ll see that it uses Events rather than Conditions for these alerts. I hadn’t dug too hard into how these worked before, so time to get dirty.
I realized then that the alarms in question were bubbling up from VSA Manager to vCenter using AMQP, and based on prior experience I knew that WSCLI would show events when I used startListener – so I started to do some more testing.
Running wscli [cluster IP] startListener from the command-line, I verified it was listening, and then rebooted the VSA cluster service machine in my lab. And then something neat happened – the listener showed an event called MemberOfflineEvent fire. I then waited for the node to come back online and sure enough, I saw MemberOnlineEvent fire.
After adding the trigger of MemberOnlineEvent to the alarm, and having it set the status back to normal I performed another test. Sure enough, the alarm came up, and a couple of minutes later, it disappeared when the service came back online. Problem solved!
It begs the question – why wasn’t this built in to the standard alarm? I wish I knew. But at the very least, I can correct this one.
I started thinking about this newfound trick and wondered – could it apply elsewhere with the VSA solution? I know that custom alarms are created for each NFS datastore used by the VSA, and custom alarms for the VSA machines too. What could we do with those?
This may look familiar! You can make similar changes to this alarm and it will clear things up automatically there too. So what about datastores?
OK, the event is a little different, but otherwise pretty straightforward. We just need to find the corresponding event that indicates all is well again.
Below is the list of alarms created by the VSA Manager during an installation, by object context and name.
Datacenter – VSA Cluster Service Offline
Datacenter – VSA Storage Cluster Offline
Virtual Machine – VSA Member Offline
Datastore – VSA Storage Entity Offline
Datastore – VSA Storage Entity Degraded
All of these only have alarms for triggering, but none are set up by default to clear themselves. After using WSCLI and lots of testing, here are the settings I feel make the most sense to give the best idea of what is actually happening. This list only shows what would be added to the existing alarm to automatically clear it out.
Alarm: VSA Cluster Service Offline, VSA Member Offline
This clears up most of the alarms very easily. The datastore alarms are a little more interesting – you can actually have both the Offline and Degraded alarms fired during an outage, which isn’t necessarily helpful. The below changes will ensure only one is showing at a time.
Alarm: VSA Storage Entity Offline
Alarm: VSA Storage Entity Degraded
With the above changes, the datastore will go from green, to yellow, to red, and back up the list in the proper order.
This is fun and all, but this entry is actually about how to Power Off a VSA VM isn’t it?
Once I had figured out the connection between the events sent to vCenter from VSA Manager, I remembered something from my many adventures in WSCLI regarding the shutdownSvaServer command:
This operation sends an SvaEnterMaintenanceModeEvent management event
when the node is marked for entering maintenance mode, and a
SvaShutdownEvent when the VSA service is ready to shut down.
I hope this is helping you catch on to where this is going.
Let’s do an experiment in our VSA Cluster – create a custom alarm on one of the VSA VMs like so:
This alarm object will Power Off the VSA VM automatically when the shutdownSvaServer argument is sent through WSCLI. It’s truly a beautiful thing!
And for those wanting to see what it looks like in the standard client:
There it is! It seems really simple in hindsight, but then again we did start out just trying to power off a VM.
NOTE: Disable HA before shutting the VSA down. If you don’t, your VSA will get restarted automatically. Obviously, once you are done patching and all of that, re-enable it!
This process is pretty easy to update for a single datacenter manually.
Next time, we’ll make a workflow that will go through every datacenter object and update all of the alarms.
Thanks for reading! The VSA onion continues to peel…