VMware VSA Deepdive – Part 6 – Eventing So Hard Right Now

I sure have done a lot of blogging about how to power off a VM lately, don’t you think?

One more time, but this time it’s the best version of them all.

I’m Eventing SO HARD right now

I figured this out while investigating something related to VSA and connecting the dots. Here’s how it went down.

In my setup, we have the VSA Clustering Service running on a separate VM at the remote location to handle quorum. One problem that pops up about once a month is the ongoing struggle of Windows patching. The VMs patch, and reboot as you would expect. This throws an alarm on the vCenter Datacenter object that the cluster service is offline. The other issue is that it doesn’t clear itself once the box comes back up. I wanted to find a way to clear this automatically, so that my monitoring guys didn’t lose their minds when hundreds of these appeared at 2AM.

If you look at the settings for the Alarm in question, you’ll see that it uses Events rather than Conditions for these alerts. I hadn’t dug too hard into how these worked before, so time to get dirty.

The default VSA Cluster Service Alarm.
The default VSA Cluster Service Alarm.

I realized then that the alarms in question were bubbling up from VSA Manager to vCenter using AMQP, and based on prior experience I knew that WSCLI would show events when I used startListener – so I started to do some more testing.

Running wscli [cluster IP] startListener from the command-line, I verified it was listening, and then rebooted the VSA cluster service machine in my lab. And then something neat happened – the listener showed an event called MemberOfflineEvent fire. I then waited for the node to come back online and sure enough, I saw MemberOnlineEvent fire.

After adding the trigger of MemberOnlineEvent to the alarm, and having it set the status back to normal I performed another test. Sure enough, the alarm came up, and a couple of minutes later, it disappeared when the service came back online. Problem solved!

It begs the question – why wasn’t this built in to the standard alarm? I wish I knew. But at the very least, I can correct this one.

I started thinking about this newfound trick and wondered – could it apply elsewhere with the VSA solution? I know that custom alarms are created for each NFS datastore used by the VSA, and custom alarms for the VSA machines too. What could we do with those?

The default VSA VM offline alarm settings.
The default VSA VM offline alarm settings.

This may look familiar! You can make similar changes to this alarm and it will clear things up automatically there too. So what about datastores?

The default VSA Datastore Alarm.
The default VSA Datastore Offline Alarm.

OK, the event is a little different, but otherwise pretty straightforward. We just need to find the corresponding event that indicates all is well again.

Below is the list of alarms created by the VSA Manager during an installation, by object context and name.
Datacenter – VSA Cluster Service Offline
Datacenter – VSA Storage Cluster Offline
Virtual Machine – VSA Member Offline
Datastore – VSA Storage Entity Offline
Datastore – VSA Storage Entity Degraded

All of these only have alarms for triggering, but none are set up by default to clear themselves. After using WSCLI and lots of testing, here are the settings I feel make the most sense to give the best idea of what is actually happening. This list only shows what would be added to the existing alarm to automatically clear it out.

Alarm: VSA Cluster Service Offline, VSA Member Offline
Event: MemberOnlineEvent
Status: Green

This clears up most of the alarms very easily. The datastore alarms are a little more interesting – you can actually have both the Offline and Degraded alarms fired during an outage, which isn’t necessarily helpful. The below changes will ensure only one is showing at a time.

Alarm: VSA Storage Entity Offline
Event: StorageEntityDegradedEvent
Status: Green

Alarm: VSA Storage Entity Degraded
Event: StorageEntityOnlineEvent
Status: Green

With the above changes, the datastore will go from green, to yellow, to red, and back up the list in the proper order.

Eureka Moment

This is fun and all, but this entry is actually about how to Power Off a VSA VM isn’t it?
Once I had figured out the connection between the events sent to vCenter from VSA Manager, I remembered something from my many adventures in WSCLI regarding the shutdownSvaServer command:

This operation sends an SvaEnterMaintenanceModeEvent management event
when the node is marked for entering maintenance mode, and a
SvaShutdownEvent when the VSA service is ready to shut down.

I hope this is helping you catch on to where this is going.
Let’s do an experiment in our VSA Cluster – create a custom alarm on one of the VSA VMs like so:

Creating a custom Maintenance Mode Alarm.
Creating a custom Maintenance Mode Alarm.
Setting custom Maintenance Mode Alarm event values.
Setting custom Maintenance Mode Alarm event values.
Setting custom Maintenance Mode Alarm action values.
Setting custom Maintenance Mode Alarm action values.

This alarm object will Power Off the VSA VM automatically when the shutdownSvaServer argument is sent through WSCLI. It’s truly a beautiful thing!

The alarm works!
The alarm works!

And for those wanting to see what it looks like in the standard client:


There it is! It seems really simple in hindsight, but then again we did start out just trying to power off a VM.
NOTE: Disable HA before shutting the VSA down. If you don’t, your VSA will get restarted automatically. Obviously, once you are done patching and all of that, re-enable it!

This process is pretty easy to update for a single datacenter manually.

Next time, we’ll make a workflow that will go through every datacenter object and update all of the alarms.

Thanks for reading! The VSA onion continues to peel…

VMware VSA Deepdive – Part 5 – Use SOAP!

Fake Edit: It has been a while! It’s been lots of great weather, people visits, and then VMworld happened. Time to get back on track!

In the last post about the VSA, we leveraged the SSH plugin in VCO to send the necessary command to the host that would force the VM to power off, as we can’t do it from vCenter. There are pros and cons to that approach, though.

First of all, not particularly great from a security perspective – you have to have SSH open and the service started to make that happen. Depending on your environment, this may not be seen as a good thing.

You’re also using the root credentials to accomplish the feat. You could use another one, but that’s a whole lot of work just to set that whole thing up in preparation for this scenario. You’d have to take into account rolling the root password and how you could work that into the workflow and managing it over time.

And finally, related to the above–this process isn’t going to pop up in a log very easily since you’re bypassing the API *AND* vCenter.

So, given these faults I needed to find another way to proceed. To be fair this process also required root to the host so it wasn’t that much better, but it would at least show up in host logging. The answer?

SOAP. The yardstick of all APIs.
SOAP. The yardstick of all APIs.

Yep. I was desperate. But if it helps me to automate working with 1500+ hosts, it’s WORTH IT.

I want you to hit me as hard as you can

I haven’t had to make SOAP requests to anything in ages, so I was pretty rusty. Thankfully VCO to the rescue again with a built-in SOAP plugin.
First things first. Make a new Workflow and define two input Parameters – one for the ESXi host, and the VSA VM you wish to power down.

Inputs for the SOAP workflow.
Inputs for the SOAP workflow.

For your attributes, define the following as type String:

  • inputHostWSDLUri – this is to specify the WSDL URI used to talk to the ESXi host.
  • inputHostPreferredEndpoint – this is for talking to the API endpoint.
  • inputHostName – this is just to hold a string value of the ESXi host for later.

The rest of the attributes in the workflow can simply be bound as you add the other workflows that come with VCO.

Preparing the SOAP Host entry

One thing to keep in mind – when adding an ESXi host as a SOAP Host, it sets some values automatically that do not allow this to complete as expected. The first problem is the SOAP Endpoint and the SOAP WSDL URI. Both of these, when enumerated by the SOAP plugin point to https://localhost which makes sense for when the ESX host is doing calls to itself, but not for VCO to reach out to it remotely. The first order of business is to fix these values.

Create a Scriptable Task element in your schema, and bind the input Parameter inputHost to it. Bind the 3 attributes you defined above for output. Then, input the code found below.

Setting up the WSDL and Endpoint URI.
Setting up the WSDL and Endpoint URI.

This code is pretty straightforward. It is simply replacing the values with the correct ones to perform remote SOAP calls.

Next up, drop a copy of the Add a SOAP Host workflow into your schema. Bind the inputHostName and inputHostWSDLUri values to the name and wsdlUri parameters, and bind the rest to new attributes/values as you desire.

Binding new attributes to the Add a SOAP Host Workflow.
Binding new attributes to the Add a SOAP Host Workflow.

You’ll need to provide things such as the username/password to the host, timeout values and other values, all of which can be static values, or linked to a Configuration Element.

For the OUT parameter of this workflow element, bind it to a new attribute named soapHost so we can use it later.

Modify the SOAP Host

Before you proceed, you need to update the new SOAP Host with the new value of inputHostPreferredEndpoint.
Drop a copy of the Update a SOAP Host with an endpoint URL workflow into the schema.
Simply bind the attributes soapHost and inputHostPreferredEndpoint to their respective input parameters, and bind soapHost to the output parameter, so that it completes.

Using SOAP to find the VSA and shut it down

Add a Scriptable Task to your schema. On the inputs, bind the soapHost, inputVM, username, and password attributes.

Below is the code you can copy/paste, with comments as needed.

// get the initial operation you want to go for from the SOAP host specified.
var operation = soapHost.getOperation("RetrieveServiceContent");
// Once you have the SOAP Operation, create the request object for it
var request = operation.createSOAPRequest();

// set Parameters and their attributes.
request.setInParameter("_this", "ServiceInstance"); // creating the input Parameter itself
request.addInParameterAttribute("_this", "type", "ServiceInstance");

// make the request, save as response variable
var response = operation.invoke(request);

// retrieved values to be passed on down the line
var searchIndex = response.getOutParameter("returnval.searchIndex")
var rootFolder = response.getOutParameter("returnval.rootFolder")
var sessionMgr = response.getOutParameter("returnval.sessionManager")

// get Login Session to add to future headers.
var hostLoginOp = soapHost.getOperation("Login")
// create Login request
var loginReq = hostLoginOp.createSOAPRequest()
loginReq.setInParameter("_this", sessionMgr) // using value from initial query
loginReq.addInParameterAttribute("_this", "type", "SessionManager")
loginReq.setInParameter("userName", inputUser)
loginReq.setInParameter("password", inputPassword)
var loginResp = hostLoginOp.invoke(loginReq)
var sessionKey = loginResp.getOutParameter("returnval.key")

// find the VSA VM on the host.
var vmoperation = soapHost.getOperation("FindChild")
System.log("VM Search Operation is: "+vmoperation.name)
var vmreq = vmoperation.createSOAPRequest()
// define parameters
vmreq.setInParameter("_this", "ha-searchindex") // get the SearchIndex
vmreq.addInParameterAttribute("_this", "type", "SearchIndex")
vmreq.setInParameter("entity", "ha-folder-vm") // representing the root VM Folder
vmreq.setInParameter("name", inputVM.name) // your search criteria

// send request, get the response in a variable
var vmresp = vmoperation.invoke(vmreq)
// assign moref to variable
var vmMoRef = vmresp.getOutParameter("returnval")
// this log shows the output value
System.log("MoREF of VM ["+inputVM.name+"] on ["+soapHost.name+"]: "+vmMoRef )

// now that you have the MoRef of the VSA VM, you can kick off the Power Off task with a decision/parameter.
var pwroffOp = soapHost.getOperation("PowerOffVM_Task") // get the Power Off operation
var pwroffOpReq = pwroffOp.createSOAPRequest() // create the Power Off request
// define parameters
pwroffOpReq.setInParameter("_this", vmMoRef) // assign the MoRef of the VM to power off
pwroffOpReq.addInParameterAttribute("_this", "type", "VirtualMachine")
// shut off the VM by executing the request.
var offresp = pwroffOp.invoke(pwroffOpReq)

And there you have it. If you direct connect to the ESXi host when you run this workflow, you will see a task for powering off the VM appear and you are good to go.

One thing I prefer to do at the end of this workflow is to drop in the Remove a SOAP Host workflow and bind appropriately so that my host list doesn’t get too large, but this is obviously optional.

A final note on SSL Certificates

If you run this as it is out of the gate, you will probably get a pop-up regarding whether to trust the SSL certificate of the host.
Of course in a perfect world, all of your certificates are trusted top to bottom and are maintained. But anyone who has tried to do this at scale has struggled and probably doesn’t bother. In order to bypass this, you’ll need to make a few adjustments and duplicate the stock workflows so you can make changes to them.

In the VCO workflow list, go down to Library -> SOAP -> Configuration and right-click Manage SSL Certificates.
Choose Duplicate Workflow.
Duplicate SSL Workflow
Save your copy of the workflow wherever you like. You may want to change the name a bit to reflect that it isn’t the standard workflow.

Now, you can edit the workflow and make a minor adjustment. Here is the workflow by default.

The default Manage SSL Certificates workflow schema.
The default Manage SSL Certificates workflow schema.

You’ll notice the Accept SSL certificate schema element. Simply click it and delete it from the schema.

The "custom" Manage SSL Certificates workflow schema.
The “custom” Manage SSL Certificates workflow schema.

Finally, click on the General tab of your workflow, and look for an attribute named installCertificates. Inside of the value, input the text Install. The workflow element Install certificate does a simple check to see if the attribute is requesting to install, and continues from there.

Updating the new Manage SSL Certificates attributes.
Updating the new Manage SSL Certificates attributes.

As a final step, you will want to duplicate the Add a SOAP Host workflow, and replace the Manage SSL Certificates element with this new one you have created.
Ensure that the two attributes are rebound to the values the old one were using.

Re-adding the workflow bindings for SSL Certificates.
Re-adding the workflow bindings for SSL Certificates.

With these changes, you can Add a SOAP Host and not get stopped for SSL verification.

Of course, this isn’t really a best practice, but it gets the job done.

Next up, the final and maybe the most elegant solution for working with the VSA VM Power Off situation.