VMware VSA Deepdive – Part 4 – Shutting Down VSA (SSH Edition)

The first way I elected to try and forcibly shut down the VSA VM was to do everything through the ESXi command line via SSH. ESXCLI itself is not implemented in VCO directly, so this will require some good old fashioned text parsing with AWK.

Enabling SSH on the host through a VCO Action

Unfortunately out of the box, there is no workflow/action that manages ESXi services, so I needed to roll my own.
Below is the Action setup and script code I used to check for the SSH service and start it up, given an input of type VC:HostSystem.
Create the Action, and name it startSSHService. There is no return value necessary on this Action.
Setting up the SSH Service Action

// get the list of services
var hostServices = inputHost.configManager.serviceSystem.serviceInfo.service
var sshService = null
// loop the services, find the SSH service.
for each(svc in hostServices) {
  if(svc.key == "TSM-SSH") {
    sshService = svc
    System.log("Found SSH Service on host ["+inputHost.name+"]")
    break
  }
}
if(sshService == null) {
  throw "Couldn't find SSH service on ["+inputHost.name+"]!"
}

// Enable the service
try {
  inputHost.configManager.serviceSystem.startService(sshService.key)
} catch(err) {
  System.log("ERROR--Couldn't start SSH service. Error Text: "+err)
}
// the end

So, once you have SSH started on your ESXi host, you can send commands through VCO to do what you need.

SSH Service Check Action

For a more robust workflow, you will probably want an Action that will check to see if the service is running, and return a boolean value. That way you can build in a little bit more into the flow.
The setup for the ‘check’ Action is the same, with the exception of the return value being a boolean.

Setting up the Check SSH Action.
Setting up the Check SSH Action.

The code is similar as well, just doing a simple check at the end.

var hostservices = inputHost.config.service.service
var sshSvc = null
for each(svc in hostservices) {
  // System.log("Service: "+svc.label+", Status is: "+svc.running)
  if(svc.label == "SSH") {
    sshSvc = svc
    break
  }
}

// check status, return true/false
if(sshSvc.running == true) {
  return true
} else {
  return false
}

Where’s my Burrit–VSA?

Before you power off the VSA VM you’ll want to make sure to vMotion your other guests to another node, or have a foolproof way of finding the VSA appliance on your host. Another Action to the rescue! Given an input ESXi host, this Action will query the VMs running on the host and check its tags out to see if it matches a specific value found on all VSA appliances. Note that these Tags are actually in the vCenter database, and not the Inventory Service Tags in the vSphere Web Client.

For purposes of this post, I’ll name the action getVSAonHost.

Setting up the VSA finder Action.
Setting up the VSA finder Action.
// for when you absolutely, positively need to make sure it's a VSA.
var vsaKey = "SYSTEM/COM.VMWARE.VIM.SVA"
// check the VMs on the host for the tag through a loop
for each(vm in inputHost.vm) {
  if(vm.tag) {
    for each(tag in vm.tag) {
      if(tag.key == vsaKey) {
        return vm
        break
      }
    }
  }
}

So now, you know you have the VM in question. You can then pass the VirtualMachine’s name property to your SSH command later.

Making a SSHandwich

With the ESXi host and the VSA VM in hand, you can execute the built in Run SSH Command workflow to do the final step.

Here’s the SSH command to send, which will find the VSA VM ID and power it off in one line, no questions asked:

VMID=$(vim-cmd vmsvc/getallvms | grep -i <VSA Name> | awk '{ print $1 }') && vim-cmd vmsvc/power.off $VMID

Begin by creating a new workflow, and create a single input parameter named inputHost, of type VC:HostSystem.
Then create three attributes in the General tab, naming them sshCommandhostSystemName and vmVSAName, all of type string.
Finally, create another attribute called vsaAppliance of type VC:VirtualMachine for use with the Action.

Next, drop your getVSAonHost Action into the schema, and bind the actionResult to vsaAppliance as seen below.

Binding Actions to the getVSAonHost Action.
Binding Actions to the getVSAonHost Action.

Next, drop a Scriptable Task into the Schema and bind inputHost and vsaAppliance on the IN tab. On the OUT tab, bind the attributes of hostSystemName and vmVSAName. We are effectively going to write a small blurb of code that hands off the name properties of the input objects to the output attributes for use later, along with creating the SSH command string.

Binding values to the Scriptable Task.
Binding values to the Scriptable Task.

In the Scripting Tab, we’ll use a few simple lines of code to perform the handoff of values.

// assign values to output attributes
hostSystemName = inputHost.name
vmVSAName = vsaAppliance.name
// create SSH command string using the input values
sshCommand = "VMID=$(vim-cmd vmsvc/getallvms | grep -i "+vmVSAName+" | awk \'{ print $1 }\') && vim-cmd vmsvc/power.off $VMID"

Finally, drop a copy of the Run SSH Command workflow into the schema. There are a lot of inputs here, so you will have to do some more complicated bindings. You can either force them to be input each time the workflow is run, set a static value, or bind to existing attributes.

Here’s what it looks like by default.

Setup for the SSH Workflow.
Setup for the SSH Workflow.

How you approach this part is largely up to you, but here is how I did it for this example.

The updated SSH Workflow Setup.
The updated SSH Workflow Setup.

You’ll notice for the username/password I set a static value for root and set passwordAuthentication to Yes, and changed the initial hostNameOrIP and cmd values to the attributes we created earlier. For the outputs, I created new local attributes to show the results.

Run the workflow and you should see that VSA go down!

As an aside, if HA is enabled in your VSA HA Cluster object, it will immediately attempt to restart the machine – so make sure you add in the capability to disable HA into your parent workflows first so that this doesn’t end up being a problem.

Sweet Reprieve

It’s a bit crazy the amount of effort it takes to work around a disabled method, but it does work. I don’t think it’s particularly great, and if you’re the only one who cares about the systems and don’t have audits, this may be enough.

But for my purposes this was just the first step of the journey. I did not end up actually using this way to do things, but figured it may be a good exercise to document the process.

The next post will take it in a different direction altogether, and a little bit closer to an API based solution that can also be audited.

VSA Deepdive Part 3 - Enter the Orchestrator!

VMware VSA Deepdive – Part 3 – Enter the Orchestrator

Previously we setup your VSA Home lab, and beefed up our knowledge of WSCLI.
There may have also been some general complaints  about the limitations of the VSA solution from a management perspective.

Now, it’s time to bring things together with an unlikely savior.

(Just kidding, it’s VCO to the rescue as always.)

What do you mean?

VCO being the general swiss-army knife tool it is, you probably already know it has virtually unlimited possibilities.
So how can VCO and VSA coming together make magic happen? There’s no native plugin!

Factoid 1: WSCLI is a Java JAR, and is portable code that can run directly from JRE.
Factoid 2: VCO runs Tomcat for its server software, which is executed using JRE.

For me, reading this excellent post on the VCOTeam blog was like a Eureka! moment. Surely if I can find where JRE is installed on the VCO Appliance, I can execute WSCLI commands in a programmatic way, right?

I feel like the answer is Yes?

Nailed it.  Here’s what you need to set this up.

Orchestrator Appliance Setup

First, you need to allow VCO the ability to create local processes (ie. execute local code).

To do this, SSH into the VCO Appliance as root.
Then, using VI as your text editor, type this in the SSH prompt:
vi /etc/vco/app-server/vmo/conf/vmo.properties

Once in the file, press I to enter Insert Mode, so you can add/edit content to the file.
Add this line anywhere in the file:
com.vmware.js.allow-local-process=true

Once added, press ESC, then type :wq to save changes and exit.

NOTE: Once you have made this change, a reboot of the appliance is needed.

Next up, Since SSH is enabled by default on the VCO Appliance, that means SCP is also enabled.
Using your favorite SCP Client, login to your appliance and upload WSCLI.jar to a folder of your choosing.
For purposes of this post we will put it in the folder /var/run/vco.

Now that you have the files there, the next thing to do is find the path to the JRE.
Thankfully, it’s extremely easy to find – simply type which java at the SSH prompt.

Finding the JRE on the VCO Appliance.
Finding the JRE on the VCO Appliance.

Next, you need to edit your js-io-rights.conf to allow VCO to read and execute content from both folders.

To edit the file, start by typing vi /etc/vco/app-server/js-io-rights.conf
Pressto change to Insert Mode.  You can now edit the file appropriately.
Here is what my js-io-rights.conf file looks like, in case you want to do a comparison.

04-js-io-conf-perms

My changes are specifically these to grant read/execute access:
+rwx /var/run/vco
+rx /usr/java/jre-vmware/bin

Once you have these lines in your file, press ESC to leave Insert Mode. Then type :wq and press Enter to write your changes and quit VI.

So, now you have the path to your JRE, and your WSCLI, and your permissions to the necessary folders. Let’s conduct a quick test!

Run this from the SSH prompt: /usr/java/jre-vmware/bin/java -jar /var/run/vco/WSCLI.jar

WSCLI running in VCO!
WSCLI running in VCO!

BOOM!

Create your Configuration Elements for WSCLI

Configuration Elements are your friend for using this tool. Since the path of JRE and your WSCLI upload shouldn’t really change, why not make it a global variable? That’s what a Configuration Element is essentially for.

In the VCO Client, change to Design View if you aren’t already there. Then, go to the Configuration Elements Tab.

The VCO Configurations Tab.
The VCO Configurations Tab.

Create a new Folder (or not), and create a new Element, as I have done in the above screenshot.
Once you have created it, you will go straight into the menu to customize the element.
Our goal here is to define two attributes: the path to JRE, and the path to WSCLI.jar.
Simply create two attributes, and populate them with the paths you have found earlier.
These paths are CASE-sensitive!

The WSCLI Configuration Element.
The WSCLI Configuration Element.

Save and Close your changes.

Using the Configuration Element in a Workflow

This one is almost TOO easy, and you’ll immediately see the value of Configuration Elements.

Create a new Workflow, and add two Attributes to it.  Once you have done so, you’ll see a small icon similar to Configuration Elements.

07-empty-attributes

Click the small arrow highlighted above for the pathWSCLI attribute, and you will get a dialog window to find your Configuration Element.

Available ConfigurationElements are listed in this window.
Available ConfigurationElements are listed in this window.

You’ll find your previously created ConfigurationElement, and you can bind the attribute in your workflow to it, and use it however you want in the Workflow!

Running WSCLI in a Workflow

The VCO Command Class.
The VCO Command Class.

The final step is using the Command class in VCO. This class is meant to perform the local execution of code and return the output. So if you run a command like ls -l you’ll get a list of files in a directory. But obviously it’s better when you can run WSCLI in a workflow.

In the Schema of the workflow, drop a Scriptable Task in, and bind the attributes you created to it.
To execute WSCLI commands from the appliance you will be using the Command Scripting Class, which is detailed below.

Not much there, but that’s OK! If you’re this far, I suspect you’re thinking of use cases for this, of which there are many!

Follow the below code snippet as an example.  Note that the pathJRE and pathWSCLI bits are the attributes you defined earlier.

// define command line to execute
var cmdString = pathJRE+" -jar "+pathWSCLI+" help shutdownSvaServer"
// create a new command object with your command string
var myCommand = new Command(cmdString)
// execute the command
myCommand.execute(true)
// show the output
System.log(myCommand.output)

At this point, you should see the results of WSCLI in the Logging tab.

WSCLI - now in VCO!
WSCLI – now in VCO!

You can now execute WSCLI in your workflows, but that is just part of the ongoing puzzle. As we know already, WSCLI can’t actually power down the VSA node. How do we tackle that when we’re denied that ability in vCenter?

In the next few posts, I’ll detail the various ways I was able to accomplish this, each with their pros and cons. In the meantime, I’ll leave you to create a few Actions or Workflows to add WSCLI functionality!

VSA Deepdive Part 2 - WSCLI

VMware VSA Deepdive – Part 2 – WSCLI

Previously in the series, we got the homelab version of VMware VSA up and running.

You may not know it yet, but it’s pretty important to become BEST FRIENDS with the WSCLI tool set–the only way to interact with the VSA systems from the command line and see what is actually happening behind the scenes. The VSA Manager gives you a heads up mostly–the rest is with this toolset.

There are occasional mutters about WSCLI on the internet if you search. I’d like this to be the definitive guide to it from a practical perspective.

WHAT IT DOES

WSCLI is a Java .jar file that comes installed with the VSA Manager plugin. You’ll find it in the install directory under subfolder \TOOLS. It comes with a simple .BAT file that finds the JRE’s current location and calls it along with the -jar parameter. A bonus that I have found–this being Java, the tool is essentially cross-platform so you can copy the JAR to other systems and run them from wherever. This is critical to know about later on, in my opinion.

WSCLI gives you the ability to do what is available in the VSA Manager vSphere plugin except the ability to upgrade from 5.1 to 5.5.
WSCLI output is far more verbose and can actually be useful in ways that are not obvious. When VSA Manager fails to do what you want it to (and it will), there is almost always a KB article with a how-to guide on fixing it with WSCLI.  Examples such as replacing a failed cluster node, or getting specific replication data or status are all must haves for a VSA Administrator.

SEND HELP

The first command argument for WSCLI you need to know is help. While it sounds obvious as to why, but this provides a wealth of information that becomes really interesting later. Keep the items emphasized below in the back of your mind for later to see how.
Let’s look at the shutdownSvaServer help info for an example of the help documentation.

Command usage:
shutdownSvaServer [<boolean:maintenance-mode=false>]
Parameters:
maintenance - whether the node should enter maintenance mode on restart, defaults to false
Prepares the node for a shutdown.
This command does not actually perform a shutdown of this node's VSA service or of its virtual machine. Instead, any test coverage data being collected is saved, all open files are synced, whether the node should enter maintenance mode on restart is persisted, and the ZooKeeper configuration is updated if the node will be entering maintenance mode, but the actual shutdown is not performed. The caller should arrange to power off the node on receiving the SvaShutdownEvent. Calls to this method are not counted when determining whether this node should be placed in maintenance mode because it appears to be in a continuous cycle of reboots.
This operation sends an SvaEnterMaintenanceModeEvent management event when the node is marked for entering maintenance mode, and a SvaShutdownEvent when theVSA service is ready to shut down.

Pretty verbose and unusally direct on the process, if you ask me! But notice how misleading it is – it just stops processes.  You have to power off the thing yourself.
Not a problem right?  Why wouldn’t I be able to power off a VM?

Wait, what? Y U NO POWER OFF?!!1
Wait, what? Y U NO POWER OFF?!!1

In the vCenter database, once the VSA VM is created, it is modified extensively and disables most methods entirely. So even good old Administrator@vsphere.local can’t do anything about this. This is the biggest obstacle to automation and management of the system, since it has to be powered off before you can put the host into Maintenance Mode.

And before you ask – VUM’s option to power off the VSA VM doesn’t fly here either!

There are a number of ways to tackle this which will be discussed later.

YOU MIGHT WANT TO JUST LISTEN

The second command argument for WSCLI you need to know is startListener. This launches a daemon that will show events happening in real-time, such as a cluster takeover, replication progress updated every minute or so, and entering/exiting maintenance mode. VSA Manager, assuming it shows anything useful, only updates every 30-40 seconds in the vSphere Client.

So, to build on the shutdownSvaServer command above, you could open a second command window, run wscli <cluster IP> startListener to see what’s happening within the cluster in real time. Then run the command wscli <vsa node IP> shutdownSvaServer and see the SvaShutdownEvent fire in the first window within a second or two. You can reboot the node and watch the carnage in your listener! You’ll see everything from the failover, to the reconnection, the sync, and completion.

THE COMMAND STRUCTURE

The commands are divided into 3 categories: SVA (Storage Virtual Appliance), SAS (Shared Access Storage), and VCS (VSA Cluster Service).

SVA commands are specific to the individual storage node and not the cluster in general.  Pings, version checks, and entering/exiting maintenance mode are the main uses in this context.

SAS commands are specific to the cluster, and is the most commonly used set.  Getting all of the UUID values, parameters, and querying synch state are the main uses in this context.

VCS commands are specific to the standalone cluster service.  In a two-node cluster, the quorum is handled by this service, and is installed on the vCenter Server when VSA Manager is installed. You can download the cluster service for Windows or even as a Linux package, and put it on a separate machine entirely if you want.  It is fairly rare to use any commands in here except to query status and maybe restart the service if the replicas aren’t coming up in a failover scenario.

WHAT IT ISN’T

The HELP documentation of the WSCLI package is pretty good, even if you have to do a good bit of copy/pasting from the command window to get anything useful.

But the verbosity of it, such as showing absolutely everything as UUIDs is infuriating sometimes.  Expect to be filtering all data based on the management IPs.  This isn’t PowerCLI grade goodness, unfortunately.

(MORE) GENERAL GRIPES

No real monitoring capability. In my experience, there are no real proactive warnings that something bad has happened, other than a few stock alarms in vCenter that happen and then don’t clear themselves out when the condition changes. It’s the boy who cried wolf, and you tend to ignore most of the issues as time goes on. But when things break, they break real good. And then you have to hope you don’t screw up typing in specific commands to save the day in WSCLI.

Protip to VMware – let the admins decide who can/can not Power Off their VSA VMs. It’s what roles were designed for!
(As an aside, I secretly love to give my TAM grief about this product.)

Because this is a software RAID over ethernet solution, one thing to keep in mind is that the health of the underlying hardware is far more critical. Any major problem I have had with the product has been with bad hardware, primarily the disks themselves, or the controller and stripes. Make sure that you have that under control and address right away, or you’ll pay for it.

Upgrades require complete cluster outages. This is the really unacceptable one from my perspective – but I suppose the product was geared more to SMB shops that close at some point and can take the downtime. I’m in one of the biggest shops out there, and it simply doesn’t work that way.

But in quandaries like this, you have to get creative and dig deep. So that’s what I did.

In the next few posts, less theory and complaints – more workflows and scripts, surprising finds and interesting workarounds to manage VSA at hundreds of sites.