Tag Archives: vSphere

vSAN Cluster Shutdown

A few weeks ago I had to shutdown a vSAN Cluster temporarily for a planned site-wide 24 hour power outage that was blacking out a datacentre. With the amount of warning and a multi-datacentre design this wasn’t an issue, but I made use of vSphere tags and some Powershell/PowerCLI to help with the evacuation and repopulation of the affected cluster. Hopefully some of this may be useful to others.

The infrastructure has two vSAN Clusters – Cluster-Alpha and Cluster-Beta. Cluster-Beta was the one being affected by the power outage, and there was sufficient space on Cluster-Alpha to absorb migrated workloads. Whilst they exist in different datacentres both clusters are on the same LAN and under the same vCenter.

I divided the VMs on Cluster-Beta into three categories:

  1. Powered-Off VMs and Templates. These were to stay in place, they would be inaccessible for the outage but I determined this wouldn’t present any issues.
  2. VMs which needed to migrate and stay on. These were tagged with the vSphere tag “July2019Migrate”
  3. VMs which needed to be powered off but not migrated. For example test/dev boxes which were not required for the duration. These were tagged with “July2019NOMigrate”

The tagging was important, not only to make sure I knew what was migrating and what was staying, but also what we needed to move back or power on once the electrical work had completed. PowerCLI was used to check that all powered-on VMs in Cluster-Beta were tagged one way or another.

Get the VMs in CLuster-Beta where the tag “July2019Migrate” is not assigned and the tag “July 2019NOMigrate” is not assigned and the VM is Powered On.

Get-Cluster -Name "Cluster-Beta" |Get-VM | where {
 (Get-TagAssignment -Entity $_).Tag.Name –notcontains "July2019Migrate" –and
 (Get-TagAssignment -Entity $_).Tag.Name –notcontains "July2019NOMigrate" –and
 $_.PowerState –eq “PoweredOn”}

In the week approaching the shutdown the migration was kicked off:

#Create a List of the VMs in the Source Cluster which are tagged to migrate
$MyTag= Get-Tag -Name "July2019Migrate"
$MyVMs=Get-Cluster "Cluster-Beta" | Get-VM | Where-Object {(Get-TagAssignment -Entity $_).Tags -contains $MyTag }
#Do the Migration
$TargetCluster= "Cluster-Alpha" #Target Cluster
$TargetDatastore= "vSANDatastore-Alpha" #Target Datastore on Target Cluster
$MyVMs | Move-VM -Destination (Get-Cluster -Name $TargetCluster) -Datastore (Get-Datastore -Name $TargetDatastore) -DiskStorageFormat Thin -VMotionPriority High

At shutdown time, a quick final check of the remaining powered on VMs was done and then all remaining VMs in Cluster-Beta were shut down. Once there were no running workloads on Beta it was time to shut down the vSAN cluster. This part I didn’t automate as I’m not planning on doing it a lot, and there’s comprehensive documentation in the VMware Docs site. The process is basically one of putting all the hosts into maintenance mode and then once the whole cluster is done, powering them off.

You are in a dark, quiet datacentre. There are many servers, all alike. There may be Grues here.

When power was restored, the process was largely reversed. I powered on the switches providing the network interconnect between the nodes, and then powered on those vSAN hosts and waited for them to come up. Once all the hosts were visible to vCenter, it was just a case of selecting them all and choosing “Exit Maintenance Mode”

2019-07-29 (8)

There was a momentary flash of alerts as nodes come up and wonder where their friends are, but in under a minute the cluster was passing the vSAN Health Check

image

At this point it was all ready to power on the VMs that had been shutdown and left on the cluster, and vMotion the migrated virtual machines back across. Again, PowerCLI simplified this process:

#Create a List of the VMs in the Source Cluster which are tagged to stay but need powering on.
$MyTag= Get-Tag -Name "July2019NOMigrate"
$MyVMs=Get-Cluster “Cluster-Alpha” | Get-VM | Where-Object {(Get-TagAssignment -Entity $_).Tags -contains $MyTag }
#Power on those VMs
$MyVMs | Start-VM

#Create a List of the VMs in the Source Cluster which are tagged to migrate (back)
$MyTag= Get-Tag -Name "July2019Migrate"
$MyVMs=Get-Cluster “Cluster-Alpha” | Get-VM | Where-Object {(Get-TagAssignment -Entity $_).Tags -contains $MyTag }
#Do the Migration
$TargetCluster= "Cluster-Beta" #New Target Cluster
$TargetDatastore= "vSANDatastore-Beta" #Target Datastore on Target Cluster
$MyVMs | Move-VM -Destination (Get-Cluster -Name $TargetCluster) -Datastore (Get-Datastore -Name $TargetDatastore) -DiskStorageFormat Thin -VMotionPriority High

Then it was just a case of waiting for the data to flow across the network and finally check that everything had migrated successfully and normality had been restored.

we have normality, I repeat we have normality…Anything you still can’t cope with is therefore your own problem. Please relax.

Trillian, via the keyboard of Douglas Adams. The Hitchhiker’s Guide to the Galaxy

vSAN- Controller Driver is (not) VMware Certified

In the process of upgrading a vSAN ReadyNode cluster from ESXi 6.5 to 6.7 a warning appeared in the vSAN Health check. The first host in the cluster had gone through the upgrade and was now showing the warning “Controller driver is VMware certified” (Note 1 in the image below, click on it for a larger view). The Dell HBA330 card was using an older version of the driver (2 in the image below) than recommended (3).

image

All workloads were still online, but running VMware Update Manager (VUM) did not clear this warning. Looking in the VUM patch listing showed the driver for ESXi 6.5 (4) but not the version recommended for 6.7.

image

Solution

It was necessary to manually load these replacement drivers in. A quick google showed they could be sourced from VMware’s download site. Extract the ZIP file from the download and then use the “Upload from File” option in VUM (5) to upload the ZIP file which was inside (in this case “VMW-ESX-6.7.0-lsi_msgpt3-17.00.01.00-offline_bundle-9702440.zip“). The new driver should then appear in the list (6) and will automatically be added to the “Non-Critical Host Patches” baseline (7). Final remediation is now just a case of applying that updating baseline to the host.

SNAGHTML3018550

In this particular instance the hosts were Dell PowerEdge R630 vSAN ReadyNodes with the HBA330 SAS HBA Controller option but the principles outlined in this post should apply to other configurations with the same symptoms.

vSphere 6.0- time to upgrade

vCenter-logoIf you’re running VMware vCenter and ESXi 6.0 it’s time to start planning to upgrade as General Support ends on 12 March 2020- one year from now and five years from it’s release. Thankfully the upgrade from 6.0 to 6.5 or 6.7 is usually quite straightforward, and VMware have put a lot of work into streamlining this process.

Looking at the Product Lifecycle Matrix other notable products in the VMware stable worth keeping an eye on include NSX for vSphere (NSXv) 6.2, Site Recovery Manager (SRM) 6.0 and 6.1, and vSAN 6.0-6.2.

Powered Off VM cannot be Powered On

Symptoms

A powered off VM on ESXi 6.5 will not power on and returns the error “Failed to power on virtual machine…. The attempted operation cannot be performed in the current state (Powered off)”.

(i.e. the VM cannot be powered on BECAUSE it is not powered on!)

2019-02-19 (12)

Prior to being powered down the VM properties had just been modified. In this particular case it was immediately following a manual Ubuntu install and the install DVD (from a datastore ISO) was disconnected and the CD Drive switch to” “Host Device”. These operations were performed from the ESXi Web interface.

Repeated attempts to start the VM all fail the same way.

Solution

Unregister the VM, then locate the vmx file in the datastore and re-register it. The VM should now power on.

image

VMworld 2018 US: HCI1469BU- The Future of vSAN and Hyperconverged Infrastructure

This “HCI Futures” session at VMworld US was hosted by two VPs from the Storage and Availability Business Unit, plus a customer guest. It covered the new features recently added to the vSAN environment with the release of 6.7 Update 1, alongside discussion of the possible future direction of VMware in the Hyper-Converged Infrastructure space. I caught up with the session via the online recording.

HCI is a rapidly growing architecture, with both industry wide figures from IDC and VMware’s own figures seeing massive spending increases. In the week of this VMworld, the 4-year old vSAN product is now boasting 15,000 customers. We are told customers are embarking on journeys into the Hybrid Cloud and looking for operational consistency between their On-Premises and Public Cloud environments.

The customer story incorporated into this breakout session was provided by Honeywell. They were an early adopter of vSAN in 2014, starting with the low-risk option of  hosting their management cluster on the technology. Since then they have replaced much of their traditional SAN infrastructure and are now boasting 1.7 Petabytes of data on vSAN, with compression and de-duplication giving them savings of nearly 700TB of disk.

VMware is pushing along several paths to enhance the product- the most obvious is including new storage technologies as they become available. All-flash vSAN is now commonplace, with SSDs replacing traditional spinning disk in the capacity tiers. Looking to the future, the session talked of the usage of NVMe and Persistent Memory (PMEM) developments – storage latency becoming significantly less than network latency for the first time. This prompts a move away from the current 2-tier model to one which incorporates “Adaptive Tiering” to make best use of the different storage components available.

image

In the Public Cloud- in particular the VMware on AWS offering- there have been customers who want to expand storage faster than compute. In the current model this hasn’t been possible due to the fixed-capacity building blocks that HCI is known for. This is being addressed by adding access to Amazon’s Elastic Block Storage (EBS) in 6.7U1 as a storage target for the environment. vSAN Encryption using the Amazon KMS is also included, along with the ability to utilise the Elastic DRS features when using AWS as a DRaaS provider for a vSphere environment.

vSAN is also moving away from it’s position as “just” the storage for Virtual Machines. Future developments include the introduction of file storage- and the ability to do some advanced data management- classifying, searching, and filtering the data.

With all this data being stored, VMware is looking to enhance the data protection functionality in the platform. Incorporation of native snapshots with replication to secondary storage (and cloud) for DR purposes increase the challenge to “traditional” storage vendors- and although it was played down in this talk also encroach further into the backup space which is populated by a large group of VMware partners.

Cloud Native applications are also being catered for with Kubernetes integration- using application-level hooks to leverage snapshots, replication, encryption, and backups all through the existing vCenter interface.

If you want to watch the recording of this session to get more information it’s available on the VMworld site: https://videos.vmworld.com/searchsite/2018?search=HCI1469BU. To sign up to the vSAN Beta which is covering some of the Data Protection, Cloud Native Storage, and File Services visit http://www.vmware.com/go/vsan-beta