VMware VSA Deepdive – Part 6 – Eventing So Hard Right Now

I sure have done a lot of blogging about how to power off a VM lately, don’t you think?

One more time, but this time it’s the best version of them all.

I’m Eventing SO HARD right now

I figured this out while investigating something related to VSA and connecting the dots. Here’s how it went down.

In my setup, we have the VSA Clustering Service running on a separate VM at the remote location to handle quorum. One problem that pops up about once a month is the ongoing struggle of Windows patching. The VMs patch, and reboot as you would expect. This throws an alarm on the vCenter Datacenter object that the cluster service is offline. The other issue is that it doesn’t clear itself once the box comes back up. I wanted to find a way to clear this automatically, so that my monitoring guys didn’t lose their minds when hundreds of these appeared at 2AM.

If you look at the settings for the Alarm in question, you’ll see that it uses Events rather than Conditions for these alerts. I hadn’t dug too hard into how these worked before, so time to get dirty.

The default VSA Cluster Service Alarm.
The default VSA Cluster Service Alarm.

I realized then that the alarms in question were bubbling up from VSA Manager to vCenter using AMQP, and based on prior experience I knew that WSCLI would show events when I used startListener – so I started to do some more testing.

Running wscli [cluster IP] startListener from the command-line, I verified it was listening, and then rebooted the VSA cluster service machine in my lab. And then something neat happened – the listener showed an event called MemberOfflineEvent fire. I then waited for the node to come back online and sure enough, I saw MemberOnlineEvent fire.

After adding the trigger of MemberOnlineEvent to the alarm, and having it set the status back to normal I performed another test. Sure enough, the alarm came up, and a couple of minutes later, it disappeared when the service came back online. Problem solved!

It begs the question – why wasn’t this built in to the standard alarm? I wish I knew. But at the very least, I can correct this one.

I started thinking about this newfound trick and wondered – could it apply elsewhere with the VSA solution? I know that custom alarms are created for each NFS datastore used by the VSA, and custom alarms for the VSA machines too. What could we do with those?

The default VSA VM offline alarm settings.
The default VSA VM offline alarm settings.

This may look familiar! You can make similar changes to this alarm and it will clear things up automatically there too. So what about datastores?

The default VSA Datastore Alarm.
The default VSA Datastore Offline Alarm.

OK, the event is a little different, but otherwise pretty straightforward. We just need to find the corresponding event that indicates all is well again.

Below is the list of alarms created by the VSA Manager during an installation, by object context and name.
Datacenter – VSA Cluster Service Offline
Datacenter – VSA Storage Cluster Offline
Virtual Machine – VSA Member Offline
Datastore – VSA Storage Entity Offline
Datastore – VSA Storage Entity Degraded

All of these only have alarms for triggering, but none are set up by default to clear themselves. After using WSCLI and lots of testing, here are the settings I feel make the most sense to give the best idea of what is actually happening. This list only shows what would be added to the existing alarm to automatically clear it out.

Alarm: VSA Cluster Service Offline, VSA Member Offline
Event: MemberOnlineEvent
Status: Green

This clears up most of the alarms very easily. The datastore alarms are a little more interesting – you can actually have both the Offline and Degraded alarms fired during an outage, which isn’t necessarily helpful. The below changes will ensure only one is showing at a time.

Alarm: VSA Storage Entity Offline
Event: StorageEntityDegradedEvent
Status: Green

Alarm: VSA Storage Entity Degraded
Event: StorageEntityOnlineEvent
Status: Green

With the above changes, the datastore will go from green, to yellow, to red, and back up the list in the proper order.

Eureka Moment

This is fun and all, but this entry is actually about how to Power Off a VSA VM isn’t it?
Once I had figured out the connection between the events sent to vCenter from VSA Manager, I remembered something from my many adventures in WSCLI regarding the shutdownSvaServer command:

This operation sends an SvaEnterMaintenanceModeEvent management event
when the node is marked for entering maintenance mode, and a
SvaShutdownEvent when the VSA service is ready to shut down.

I hope this is helping you catch on to where this is going.
Let’s do an experiment in our VSA Cluster – create a custom alarm on one of the VSA VMs like so:

Creating a custom Maintenance Mode Alarm.
Creating a custom Maintenance Mode Alarm.
Setting custom Maintenance Mode Alarm event values.
Setting custom Maintenance Mode Alarm event values.
Setting custom Maintenance Mode Alarm action values.
Setting custom Maintenance Mode Alarm action values.

This alarm object will Power Off the VSA VM automatically when the shutdownSvaServer argument is sent through WSCLI. It’s truly a beautiful thing!

The alarm works!
The alarm works!

And for those wanting to see what it looks like in the standard client:

05-alarm-maintenance-shutdown-client

There it is! It seems really simple in hindsight, but then again we did start out just trying to power off a VM.
NOTE: Disable HA before shutting the VSA down. If you don’t, your VSA will get restarted automatically. Obviously, once you are done patching and all of that, re-enable it!

This process is pretty easy to update for a single datacenter manually.

Next time, we’ll make a workflow that will go through every datacenter object and update all of the alarms.

Thanks for reading! The VSA onion continues to peel…

VMware VSA Deepdive – Part 5 – Use SOAP!

Fake Edit: It has been a while! It’s been lots of great weather, people visits, and then VMworld happened. Time to get back on track!

In the last post about the VSA, we leveraged the SSH plugin in VCO to send the necessary command to the host that would force the VM to power off, as we can’t do it from vCenter. There are pros and cons to that approach, though.

First of all, not particularly great from a security perspective – you have to have SSH open and the service started to make that happen. Depending on your environment, this may not be seen as a good thing.

You’re also using the root credentials to accomplish the feat. You could use another one, but that’s a whole lot of work just to set that whole thing up in preparation for this scenario. You’d have to take into account rolling the root password and how you could work that into the workflow and managing it over time.

And finally, related to the above–this process isn’t going to pop up in a log very easily since you’re bypassing the API *AND* vCenter.

So, given these faults I needed to find another way to proceed. To be fair this process also required root to the host so it wasn’t that much better, but it would at least show up in host logging. The answer?

SOAP. The yardstick of all APIs.
SOAP. The yardstick of all APIs.

Yep. I was desperate. But if it helps me to automate working with 1500+ hosts, it’s WORTH IT.

I want you to hit me as hard as you can

I haven’t had to make SOAP requests to anything in ages, so I was pretty rusty. Thankfully VCO to the rescue again with a built-in SOAP plugin.
First things first. Make a new Workflow and define two input Parameters – one for the ESXi host, and the VSA VM you wish to power down.

Inputs for the SOAP workflow.
Inputs for the SOAP workflow.

For your attributes, define the following as type String:

  • inputHostWSDLUri – this is to specify the WSDL URI used to talk to the ESXi host.
  • inputHostPreferredEndpoint – this is for talking to the API endpoint.
  • inputHostName – this is just to hold a string value of the ESXi host for later.

The rest of the attributes in the workflow can simply be bound as you add the other workflows that come with VCO.

Preparing the SOAP Host entry

One thing to keep in mind – when adding an ESXi host as a SOAP Host, it sets some values automatically that do not allow this to complete as expected. The first problem is the SOAP Endpoint and the SOAP WSDL URI. Both of these, when enumerated by the SOAP plugin point to https://localhost which makes sense for when the ESX host is doing calls to itself, but not for VCO to reach out to it remotely. The first order of business is to fix these values.

Create a Scriptable Task element in your schema, and bind the input Parameter inputHost to it. Bind the 3 attributes you defined above for output. Then, input the code found below.

Setting up the WSDL and Endpoint URI.
Setting up the WSDL and Endpoint URI.

This code is pretty straightforward. It is simply replacing the values with the correct ones to perform remote SOAP calls.

Next up, drop a copy of the Add a SOAP Host workflow into your schema. Bind the inputHostName and inputHostWSDLUri values to the name and wsdlUri parameters, and bind the rest to new attributes/values as you desire.

Binding new attributes to the Add a SOAP Host Workflow.
Binding new attributes to the Add a SOAP Host Workflow.

You’ll need to provide things such as the username/password to the host, timeout values and other values, all of which can be static values, or linked to a Configuration Element.

For the OUT parameter of this workflow element, bind it to a new attribute named soapHost so we can use it later.

Modify the SOAP Host

Before you proceed, you need to update the new SOAP Host with the new value of inputHostPreferredEndpoint.
Drop a copy of the Update a SOAP Host with an endpoint URL workflow into the schema.
Simply bind the attributes soapHost and inputHostPreferredEndpoint to their respective input parameters, and bind soapHost to the output parameter, so that it completes.

Using SOAP to find the VSA and shut it down

Add a Scriptable Task to your schema. On the inputs, bind the soapHost, inputVM, username, and password attributes.

Below is the code you can copy/paste, with comments as needed.

// get the initial operation you want to go for from the SOAP host specified.
var operation = soapHost.getOperation("RetrieveServiceContent");
// Once you have the SOAP Operation, create the request object for it
var request = operation.createSOAPRequest();

// set Parameters and their attributes.
request.setInParameter("_this", "ServiceInstance"); // creating the input Parameter itself
request.addInParameterAttribute("_this", "type", "ServiceInstance");

// make the request, save as response variable
var response = operation.invoke(request);

// retrieved values to be passed on down the line
var searchIndex = response.getOutParameter("returnval.searchIndex")
var rootFolder = response.getOutParameter("returnval.rootFolder")
var sessionMgr = response.getOutParameter("returnval.sessionManager")

// get Login Session to add to future headers.
var hostLoginOp = soapHost.getOperation("Login")
// create Login request
var loginReq = hostLoginOp.createSOAPRequest()
loginReq.setInParameter("_this", sessionMgr) // using value from initial query
loginReq.addInParameterAttribute("_this", "type", "SessionManager")
loginReq.setInParameter("userName", inputUser)
loginReq.setInParameter("password", inputPassword)
var loginResp = hostLoginOp.invoke(loginReq)
var sessionKey = loginResp.getOutParameter("returnval.key")

// find the VSA VM on the host.
var vmoperation = soapHost.getOperation("FindChild")
System.log("VM Search Operation is: "+vmoperation.name)
var vmreq = vmoperation.createSOAPRequest()
// define parameters
vmreq.setInParameter("_this", "ha-searchindex") // get the SearchIndex
vmreq.addInParameterAttribute("_this", "type", "SearchIndex")
vmreq.setInParameter("entity", "ha-folder-vm") // representing the root VM Folder
vmreq.setInParameter("name", inputVM.name) // your search criteria

// send request, get the response in a variable
var vmresp = vmoperation.invoke(vmreq)
// assign moref to variable
var vmMoRef = vmresp.getOutParameter("returnval")
// this log shows the output value
System.log("MoREF of VM ["+inputVM.name+"] on ["+soapHost.name+"]: "+vmMoRef )

// now that you have the MoRef of the VSA VM, you can kick off the Power Off task with a decision/parameter.
var pwroffOp = soapHost.getOperation("PowerOffVM_Task") // get the Power Off operation
var pwroffOpReq = pwroffOp.createSOAPRequest() // create the Power Off request
// define parameters
pwroffOpReq.setInParameter("_this", vmMoRef) // assign the MoRef of the VM to power off
pwroffOpReq.addInParameterAttribute("_this", "type", "VirtualMachine")
// shut off the VM by executing the request.
var offresp = pwroffOp.invoke(pwroffOpReq)

And there you have it. If you direct connect to the ESXi host when you run this workflow, you will see a task for powering off the VM appear and you are good to go.

One thing I prefer to do at the end of this workflow is to drop in the Remove a SOAP Host workflow and bind appropriately so that my host list doesn’t get too large, but this is obviously optional.

A final note on SSL Certificates

If you run this as it is out of the gate, you will probably get a pop-up regarding whether to trust the SSL certificate of the host.
Of course in a perfect world, all of your certificates are trusted top to bottom and are maintained. But anyone who has tried to do this at scale has struggled and probably doesn’t bother. In order to bypass this, you’ll need to make a few adjustments and duplicate the stock workflows so you can make changes to them.

In the VCO workflow list, go down to Library -> SOAP -> Configuration and right-click Manage SSL Certificates.
Choose Duplicate Workflow.
Duplicate SSL Workflow
Save your copy of the workflow wherever you like. You may want to change the name a bit to reflect that it isn’t the standard workflow.

Now, you can edit the workflow and make a minor adjustment. Here is the workflow by default.

The default Manage SSL Certificates workflow schema.
The default Manage SSL Certificates workflow schema.

You’ll notice the Accept SSL certificate schema element. Simply click it and delete it from the schema.

The "custom" Manage SSL Certificates workflow schema.
The “custom” Manage SSL Certificates workflow schema.

Finally, click on the General tab of your workflow, and look for an attribute named installCertificates. Inside of the value, input the text Install. The workflow element Install certificate does a simple check to see if the attribute is requesting to install, and continues from there.

Updating the new Manage SSL Certificates attributes.
Updating the new Manage SSL Certificates attributes.

As a final step, you will want to duplicate the Add a SOAP Host workflow, and replace the Manage SSL Certificates element with this new one you have created.
Ensure that the two attributes are rebound to the values the old one were using.

Re-adding the workflow bindings for SSL Certificates.
Re-adding the workflow bindings for SSL Certificates.

With these changes, you can Add a SOAP Host and not get stopped for SSL verification.

Of course, this isn’t really a best practice, but it gets the job done.

Next up, the final and maybe the most elegant solution for working with the VSA VM Power Off situation.

VMware VSA Deepdive – Part 4 – Shutting Down VSA (SSH Edition)

The first way I elected to try and forcibly shut down the VSA VM was to do everything through the ESXi command line via SSH. ESXCLI itself is not implemented in VCO directly, so this will require some good old fashioned text parsing with AWK.

Enabling SSH on the host through a VCO Action

Unfortunately out of the box, there is no workflow/action that manages ESXi services, so I needed to roll my own.
Below is the Action setup and script code I used to check for the SSH service and start it up, given an input of type VC:HostSystem.
Create the Action, and name it startSSHService. There is no return value necessary on this Action.
Setting up the SSH Service Action

// get the list of services
var hostServices = inputHost.configManager.serviceSystem.serviceInfo.service
var sshService = null
// loop the services, find the SSH service.
for each(svc in hostServices) {
  if(svc.key == "TSM-SSH") {
    sshService = svc
    System.log("Found SSH Service on host ["+inputHost.name+"]")
    break
  }
}
if(sshService == null) {
  throw "Couldn't find SSH service on ["+inputHost.name+"]!"
}

// Enable the service
try {
  inputHost.configManager.serviceSystem.startService(sshService.key)
} catch(err) {
  System.log("ERROR--Couldn't start SSH service. Error Text: "+err)
}
// the end

So, once you have SSH started on your ESXi host, you can send commands through VCO to do what you need.

SSH Service Check Action

For a more robust workflow, you will probably want an Action that will check to see if the service is running, and return a boolean value. That way you can build in a little bit more into the flow.
The setup for the ‘check’ Action is the same, with the exception of the return value being a boolean.

Setting up the Check SSH Action.
Setting up the Check SSH Action.

The code is similar as well, just doing a simple check at the end.

var hostservices = inputHost.config.service.service
var sshSvc = null
for each(svc in hostservices) {
  // System.log("Service: "+svc.label+", Status is: "+svc.running)
  if(svc.label == "SSH") {
    sshSvc = svc
    break
  }
}

// check status, return true/false
if(sshSvc.running == true) {
  return true
} else {
  return false
}

Where’s my Burrit–VSA?

Before you power off the VSA VM you’ll want to make sure to vMotion your other guests to another node, or have a foolproof way of finding the VSA appliance on your host. Another Action to the rescue! Given an input ESXi host, this Action will query the VMs running on the host and check its tags out to see if it matches a specific value found on all VSA appliances. Note that these Tags are actually in the vCenter database, and not the Inventory Service Tags in the vSphere Web Client.

For purposes of this post, I’ll name the action getVSAonHost.

Setting up the VSA finder Action.
Setting up the VSA finder Action.
// for when you absolutely, positively need to make sure it's a VSA.
var vsaKey = "SYSTEM/COM.VMWARE.VIM.SVA"
// check the VMs on the host for the tag through a loop
for each(vm in inputHost.vm) {
  if(vm.tag) {
    for each(tag in vm.tag) {
      if(tag.key == vsaKey) {
        return vm
        break
      }
    }
  }
}

So now, you know you have the VM in question. You can then pass the VirtualMachine’s name property to your SSH command later.

Making a SSHandwich

With the ESXi host and the VSA VM in hand, you can execute the built in Run SSH Command workflow to do the final step.

Here’s the SSH command to send, which will find the VSA VM ID and power it off in one line, no questions asked:

VMID=$(vim-cmd vmsvc/getallvms | grep -i <VSA Name> | awk '{ print $1 }') && vim-cmd vmsvc/power.off $VMID

Begin by creating a new workflow, and create a single input parameter named inputHost, of type VC:HostSystem.
Then create three attributes in the General tab, naming them sshCommandhostSystemName and vmVSAName, all of type string.
Finally, create another attribute called vsaAppliance of type VC:VirtualMachine for use with the Action.

Next, drop your getVSAonHost Action into the schema, and bind the actionResult to vsaAppliance as seen below.

Binding Actions to the getVSAonHost Action.
Binding Actions to the getVSAonHost Action.

Next, drop a Scriptable Task into the Schema and bind inputHost and vsaAppliance on the IN tab. On the OUT tab, bind the attributes of hostSystemName and vmVSAName. We are effectively going to write a small blurb of code that hands off the name properties of the input objects to the output attributes for use later, along with creating the SSH command string.

Binding values to the Scriptable Task.
Binding values to the Scriptable Task.

In the Scripting Tab, we’ll use a few simple lines of code to perform the handoff of values.

// assign values to output attributes
hostSystemName = inputHost.name
vmVSAName = vsaAppliance.name
// create SSH command string using the input values
sshCommand = "VMID=$(vim-cmd vmsvc/getallvms | grep -i "+vmVSAName+" | awk \'{ print $1 }\') && vim-cmd vmsvc/power.off $VMID"

Finally, drop a copy of the Run SSH Command workflow into the schema. There are a lot of inputs here, so you will have to do some more complicated bindings. You can either force them to be input each time the workflow is run, set a static value, or bind to existing attributes.

Here’s what it looks like by default.

Setup for the SSH Workflow.
Setup for the SSH Workflow.

How you approach this part is largely up to you, but here is how I did it for this example.

The updated SSH Workflow Setup.
The updated SSH Workflow Setup.

You’ll notice for the username/password I set a static value for root and set passwordAuthentication to Yes, and changed the initial hostNameOrIP and cmd values to the attributes we created earlier. For the outputs, I created new local attributes to show the results.

Run the workflow and you should see that VSA go down!

As an aside, if HA is enabled in your VSA HA Cluster object, it will immediately attempt to restart the machine – so make sure you add in the capability to disable HA into your parent workflows first so that this doesn’t end up being a problem.

Sweet Reprieve

It’s a bit crazy the amount of effort it takes to work around a disabled method, but it does work. I don’t think it’s particularly great, and if you’re the only one who cares about the systems and don’t have audits, this may be enough.

But for my purposes this was just the first step of the journey. I did not end up actually using this way to do things, but figured it may be a good exercise to document the process.

The next post will take it in a different direction altogether, and a little bit closer to an API based solution that can also be audited.

VSA Deepdive Part 3 - Enter the Orchestrator!

VMware VSA Deepdive – Part 3 – Enter the Orchestrator

Previously we setup your VSA Home lab, and beefed up our knowledge of WSCLI.
There may have also been some general complaints  about the limitations of the VSA solution from a management perspective.

Now, it’s time to bring things together with an unlikely savior.

(Just kidding, it’s VCO to the rescue as always.)

What do you mean?

VCO being the general swiss-army knife tool it is, you probably already know it has virtually unlimited possibilities.
So how can VCO and VSA coming together make magic happen? There’s no native plugin!

Factoid 1: WSCLI is a Java JAR, and is portable code that can run directly from JRE.
Factoid 2: VCO runs Tomcat for its server software, which is executed using JRE.

For me, reading this excellent post on the VCOTeam blog was like a Eureka! moment. Surely if I can find where JRE is installed on the VCO Appliance, I can execute WSCLI commands in a programmatic way, right?

I feel like the answer is Yes?

Nailed it.  Here’s what you need to set this up.

Orchestrator Appliance Setup

First, you need to allow VCO the ability to create local processes (ie. execute local code).

To do this, SSH into the VCO Appliance as root.
Then, using VI as your text editor, type this in the SSH prompt:
vi /etc/vco/app-server/vmo/conf/vmo.properties

Once in the file, press I to enter Insert Mode, so you can add/edit content to the file.
Add this line anywhere in the file:
com.vmware.js.allow-local-process=true

Once added, press ESC, then type :wq to save changes and exit.

NOTE: Once you have made this change, a reboot of the appliance is needed.

Next up, Since SSH is enabled by default on the VCO Appliance, that means SCP is also enabled.
Using your favorite SCP Client, login to your appliance and upload WSCLI.jar to a folder of your choosing.
For purposes of this post we will put it in the folder /var/run/vco.

Now that you have the files there, the next thing to do is find the path to the JRE.
Thankfully, it’s extremely easy to find – simply type which java at the SSH prompt.

Finding the JRE on the VCO Appliance.
Finding the JRE on the VCO Appliance.

Next, you need to edit your js-io-rights.conf to allow VCO to read and execute content from both folders.

To edit the file, start by typing vi /etc/vco/app-server/js-io-rights.conf
Pressto change to Insert Mode.  You can now edit the file appropriately.
Here is what my js-io-rights.conf file looks like, in case you want to do a comparison.

04-js-io-conf-perms

My changes are specifically these to grant read/execute access:
+rwx /var/run/vco
+rx /usr/java/jre-vmware/bin

Once you have these lines in your file, press ESC to leave Insert Mode. Then type :wq and press Enter to write your changes and quit VI.

So, now you have the path to your JRE, and your WSCLI, and your permissions to the necessary folders. Let’s conduct a quick test!

Run this from the SSH prompt: /usr/java/jre-vmware/bin/java -jar /var/run/vco/WSCLI.jar

WSCLI running in VCO!
WSCLI running in VCO!

BOOM!

Create your Configuration Elements for WSCLI

Configuration Elements are your friend for using this tool. Since the path of JRE and your WSCLI upload shouldn’t really change, why not make it a global variable? That’s what a Configuration Element is essentially for.

In the VCO Client, change to Design View if you aren’t already there. Then, go to the Configuration Elements Tab.

The VCO Configurations Tab.
The VCO Configurations Tab.

Create a new Folder (or not), and create a new Element, as I have done in the above screenshot.
Once you have created it, you will go straight into the menu to customize the element.
Our goal here is to define two attributes: the path to JRE, and the path to WSCLI.jar.
Simply create two attributes, and populate them with the paths you have found earlier.
These paths are CASE-sensitive!

The WSCLI Configuration Element.
The WSCLI Configuration Element.

Save and Close your changes.

Using the Configuration Element in a Workflow

This one is almost TOO easy, and you’ll immediately see the value of Configuration Elements.

Create a new Workflow, and add two Attributes to it.  Once you have done so, you’ll see a small icon similar to Configuration Elements.

07-empty-attributes

Click the small arrow highlighted above for the pathWSCLI attribute, and you will get a dialog window to find your Configuration Element.

Available ConfigurationElements are listed in this window.
Available ConfigurationElements are listed in this window.

You’ll find your previously created ConfigurationElement, and you can bind the attribute in your workflow to it, and use it however you want in the Workflow!

Running WSCLI in a Workflow

The VCO Command Class.
The VCO Command Class.

The final step is using the Command class in VCO. This class is meant to perform the local execution of code and return the output. So if you run a command like ls -l you’ll get a list of files in a directory. But obviously it’s better when you can run WSCLI in a workflow.

In the Schema of the workflow, drop a Scriptable Task in, and bind the attributes you created to it.
To execute WSCLI commands from the appliance you will be using the Command Scripting Class, which is detailed below.

Not much there, but that’s OK! If you’re this far, I suspect you’re thinking of use cases for this, of which there are many!

Follow the below code snippet as an example.  Note that the pathJRE and pathWSCLI bits are the attributes you defined earlier.

// define command line to execute
var cmdString = pathJRE+" -jar "+pathWSCLI+" help shutdownSvaServer"
// create a new command object with your command string
var myCommand = new Command(cmdString)
// execute the command
myCommand.execute(true)
// show the output
System.log(myCommand.output)

At this point, you should see the results of WSCLI in the Logging tab.

WSCLI - now in VCO!
WSCLI – now in VCO!

You can now execute WSCLI in your workflows, but that is just part of the ongoing puzzle. As we know already, WSCLI can’t actually power down the VSA node. How do we tackle that when we’re denied that ability in vCenter?

In the next few posts, I’ll detail the various ways I was able to accomplish this, each with their pros and cons. In the meantime, I’ll leave you to create a few Actions or Workflows to add WSCLI functionality!

VSA Deepdive Part 2 - WSCLI

VMware VSA Deepdive – Part 2 – WSCLI

Previously in the series, we got the homelab version of VMware VSA up and running.

You may not know it yet, but it’s pretty important to become BEST FRIENDS with the WSCLI tool set–the only way to interact with the VSA systems from the command line and see what is actually happening behind the scenes. The VSA Manager gives you a heads up mostly–the rest is with this toolset.

There are occasional mutters about WSCLI on the internet if you search. I’d like this to be the definitive guide to it from a practical perspective.

WHAT IT DOES

WSCLI is a Java .jar file that comes installed with the VSA Manager plugin. You’ll find it in the install directory under subfolder \TOOLS. It comes with a simple .BAT file that finds the JRE’s current location and calls it along with the -jar parameter. A bonus that I have found–this being Java, the tool is essentially cross-platform so you can copy the JAR to other systems and run them from wherever. This is critical to know about later on, in my opinion.

WSCLI gives you the ability to do what is available in the VSA Manager vSphere plugin except the ability to upgrade from 5.1 to 5.5.
WSCLI output is far more verbose and can actually be useful in ways that are not obvious. When VSA Manager fails to do what you want it to (and it will), there is almost always a KB article with a how-to guide on fixing it with WSCLI.  Examples such as replacing a failed cluster node, or getting specific replication data or status are all must haves for a VSA Administrator.

SEND HELP

The first command argument for WSCLI you need to know is help. While it sounds obvious as to why, but this provides a wealth of information that becomes really interesting later. Keep the items emphasized below in the back of your mind for later to see how.
Let’s look at the shutdownSvaServer help info for an example of the help documentation.

Command usage:
shutdownSvaServer [<boolean:maintenance-mode=false>]
Parameters:
maintenance - whether the node should enter maintenance mode on restart, defaults to false
Prepares the node for a shutdown.
This command does not actually perform a shutdown of this node's VSA service or of its virtual machine. Instead, any test coverage data being collected is saved, all open files are synced, whether the node should enter maintenance mode on restart is persisted, and the ZooKeeper configuration is updated if the node will be entering maintenance mode, but the actual shutdown is not performed. The caller should arrange to power off the node on receiving the SvaShutdownEvent. Calls to this method are not counted when determining whether this node should be placed in maintenance mode because it appears to be in a continuous cycle of reboots.
This operation sends an SvaEnterMaintenanceModeEvent management event when the node is marked for entering maintenance mode, and a SvaShutdownEvent when theVSA service is ready to shut down.

Pretty verbose and unusally direct on the process, if you ask me! But notice how misleading it is – it just stops processes.  You have to power off the thing yourself.
Not a problem right?  Why wouldn’t I be able to power off a VM?

Wait, what? Y U NO POWER OFF?!!1
Wait, what? Y U NO POWER OFF?!!1

In the vCenter database, once the VSA VM is created, it is modified extensively and disables most methods entirely. So even good old Administrator@vsphere.local can’t do anything about this. This is the biggest obstacle to automation and management of the system, since it has to be powered off before you can put the host into Maintenance Mode.

And before you ask – VUM’s option to power off the VSA VM doesn’t fly here either!

There are a number of ways to tackle this which will be discussed later.

YOU MIGHT WANT TO JUST LISTEN

The second command argument for WSCLI you need to know is startListener. This launches a daemon that will show events happening in real-time, such as a cluster takeover, replication progress updated every minute or so, and entering/exiting maintenance mode. VSA Manager, assuming it shows anything useful, only updates every 30-40 seconds in the vSphere Client.

So, to build on the shutdownSvaServer command above, you could open a second command window, run wscli <cluster IP> startListener to see what’s happening within the cluster in real time. Then run the command wscli <vsa node IP> shutdownSvaServer and see the SvaShutdownEvent fire in the first window within a second or two. You can reboot the node and watch the carnage in your listener! You’ll see everything from the failover, to the reconnection, the sync, and completion.

THE COMMAND STRUCTURE

The commands are divided into 3 categories: SVA (Storage Virtual Appliance), SAS (Shared Access Storage), and VCS (VSA Cluster Service).

SVA commands are specific to the individual storage node and not the cluster in general.  Pings, version checks, and entering/exiting maintenance mode are the main uses in this context.

SAS commands are specific to the cluster, and is the most commonly used set.  Getting all of the UUID values, parameters, and querying synch state are the main uses in this context.

VCS commands are specific to the standalone cluster service.  In a two-node cluster, the quorum is handled by this service, and is installed on the vCenter Server when VSA Manager is installed. You can download the cluster service for Windows or even as a Linux package, and put it on a separate machine entirely if you want.  It is fairly rare to use any commands in here except to query status and maybe restart the service if the replicas aren’t coming up in a failover scenario.

WHAT IT ISN’T

The HELP documentation of the WSCLI package is pretty good, even if you have to do a good bit of copy/pasting from the command window to get anything useful.

But the verbosity of it, such as showing absolutely everything as UUIDs is infuriating sometimes.  Expect to be filtering all data based on the management IPs.  This isn’t PowerCLI grade goodness, unfortunately.

(MORE) GENERAL GRIPES

No real monitoring capability. In my experience, there are no real proactive warnings that something bad has happened, other than a few stock alarms in vCenter that happen and then don’t clear themselves out when the condition changes. It’s the boy who cried wolf, and you tend to ignore most of the issues as time goes on. But when things break, they break real good. And then you have to hope you don’t screw up typing in specific commands to save the day in WSCLI.

Protip to VMware – let the admins decide who can/can not Power Off their VSA VMs. It’s what roles were designed for!
(As an aside, I secretly love to give my TAM grief about this product.)

Because this is a software RAID over ethernet solution, one thing to keep in mind is that the health of the underlying hardware is far more critical. Any major problem I have had with the product has been with bad hardware, primarily the disks themselves, or the controller and stripes. Make sure that you have that under control and address right away, or you’ll pay for it.

Upgrades require complete cluster outages. This is the really unacceptable one from my perspective – but I suppose the product was geared more to SMB shops that close at some point and can take the downtime. I’m in one of the biggest shops out there, and it simply doesn’t work that way.

But in quandaries like this, you have to get creative and dig deep. So that’s what I did.

In the next few posts, less theory and complaints – more workflows and scripts, surprising finds and interesting workarounds to manage VSA at hundreds of sites.

 

VSA Deep Dive - The Lab

VMware VSA DeepDive – Part 1 – Building the lab

My deep dive into how the VMware VSA works was really brought on by necessity.

On the surface, it sounded like a great solution – and to be fair it does work quite well overall. It acts as a RAID-1 of local storage over Ethernet which is a life saver for cost-conscious companies or the SMB space. And it is really designed as a fire and forget solution – if you don’t plan on patching your hosts or really making any changes to things ever.

Up front, your investment seems like a good choice – in some cases it may have been the only one. (I’ll let you figure out where I sit on this.)
But as a virtualization admin or engineer, you will have to update it, patch it, and do maintenance at some point. For one location, it’s easy.

For hundreds of locations, it gets a lot more interesting.

The Lab Requirements

In short, you need a minimum of 2 ESXi hosts running 5.1 or higher, a Windows vCenter 5.1+ installation, and some local disks. For my home lab setup, I’m just building nested ESXi with a single thin provisioned 100gb disk per host – that’s enough to get you going for building this out.

Keep in mind, I’m setting these nested instances up in VMware Workstation!

You will need 4 vmnics on each host – 2 bridged or on the same network as vCenter, and 2 host-only for the VSA replication portgroups. The vmnics must be in this order by default:

  • vmnic0 – Bridged
  • vmnic1 – Host-Only
  • vmnic2 – Bridged
  • vmnic3 – Host-Only

If you are asking “why?” at this point, don’t worry – it will all make sense. I’m a professional.

Once you have vCenter running in Windows, you will need to install the VSA Manager plugin. It is Windows-only, and must be installed on vCenter itself, hence the earlier requirement.
The only gotcha is to make sure that the account you are logged in as on vCenter has admin privileges to both the box, and vCenter inventory.
Other than that, it’s pretty much a next-next-next type of deal.

Add your 2 ESXi hosts to a new datacenter object, and you are almost ready to get started.

Team Make It Fit

By default, the VSA will do a check on the host hardware to see if your RAID controller is on the HCL. For obvious reasons using Nested ESXi isn’t going to work exactly as planned.

The solution is simple – just disable the auditing of the host during installation entirely!
On the vCenter Server, go into the VSA Manager config folder and find the ‘dev.properties’ file, typically it’s found here:

  • C:\Program Files\VMware\Infrastructure\tomcat\webapps\VSAManager\WEB-INF\classes\dev.properties

In this text file you’ll see a number of tweaks you can perform to suit your lab. The value we care about though, is this:

# Other operations
host.audit=true
vm.rollback=true
vm.config=true
test.on=false

Simply change the value for host.audit to false, save the file and restart the vCenter Management Webservices service on the server. Then restart your vSphere Client. When you run the wizard next time, you should only get a warning, and not a fatal error about how totally unsupported you are. It’s a shame we can’t say the same back to it, at least until September, 2018.

Click the datacenter object and you should see a tab called VSA Manager. All operations are done from this menu – it doesn’t exist in any other context. You’ll get a certificate warning, even if you have a signed internal certificate installed for vCenter – so just accept it. This is only the first crack in the armor!

The installation should be pretty straightforward. You’ll need to specify a cluster IP, and a cluster SERVICE IP since you’re running 2 nodes.
When you installed VSA Manager, the clustering service was installed on vCenter, so just point the SERVICE IP to your vCenter server.

  • On a remote / ROBO deployment, you can run the clustering service on another VM – there is an available clustering service installer in the VMware portal for Linux or Windows.

The cluster IP itself can be whatever you want as long as its on the same network as vCenter, or the Cluster Service in the remote location.

In about 15 minutes, you should hopefully have a functional VSA installation ready to rumble for your lab to tinker with and destroy!

A completed VSA Lab setup.
A completed VSA Lab setup.

And destroy it, we shall.