Utilising vSphere Performance Monitoring Tools – Part Two: vscsiStats

The vscsiStats command allows for the troubleshooting of performance issues for virtual machine storage. It can collect data at the virtual SCSI device level and report this performance data as a histogram by collecting information on each I/O operation and reports on I/O metrics such as length, seek distance, number of outstanding I/O, I/O latency and inter-arrival time.

This provides more performance data than the ESXTOP which in comparison only provides latency and throughput statistics. Also, as vscsiStats targets the virtual SCSI device level it can report on virtual machine hard disks of all types. This can be useful for determining the behaviour of a workload in order to determine the placement of storage for a virtual machine.

In order to retrieve performance data for a virtual machine we will be required to obtain the virtual machine worldGroupID and if you require filter the retrieval based on a specific virtual machine hard disk the handleID. This information can be retrieved as below:

vscsiStats -l

In this example, we will initially retrieve performance data for the virtual machine ‘vm1’ wherer the worldGroupID is ‘2718498’ as described in the below output.

Virtual Machine worldGroupID: 2718498, Virtual Machine Display Name: vm1, Virtual Machine Config File: /vmfs/volumes/54afe658-9ab37544-54a8-0026b9746656/vm1/vm1.vmx, {
Virtual SCSI Disk handleID: 11547 (scsi0:0)
Virtual SCSI Disk handleID: 11548 (scsi0:1)
Virtual SCSI Disk handleID: 11549 (scsi0:2)
}

Now we will start the retrieval of performance data for all the virtual machine hard disk for the specific virtual machine.

vscsiStats -s -w 2718498

The retrieval of performance data will now occur in the background, whilst data is being retrieved which will taake 30 minutes by default, you may print the histogram for a specific statistic by specifying the statistic name (iolength, seekDistance, outstandingIOs, latency and interarrival)

vscsiStats -p latency

This will show the current data retrieved for latency of IOs, Read IOs and Write IOS in microseconds. In the below example we may see that no Write I/Os took longer than 15000 microseconds where the peak Write I/O demand was 7916 microseconds.

Histogram: latency of Write IOs in Microseconds (us) for virtual machine worldGroupID : 3595341, virtual disk handleID : 11916 (scsi0:0) {
 min : 207
 max : 7916
 mean : 371
 count : 13421
 {
 0 (<= 1)
 0 (<= 10)
 0 (<= 100)
 12000 (<= 500)
 1198 (<= 1000)
 222 (<= 5000)
 1 (<= 15000)
 0 (<= 30000)
 0 (<= 50000)
 0 (<= 100000)
 0 (> 100000) 
 }

If you require to stop the retrieval of performance data before this completes, you may so with the following command.

vscsiStats -x

In the above example, we invoked the vscsiStats command to retrieve performance data for all virtual machine hard disks for a sepcific virtual machine. If we required only to retrieve performance data for the Hard Disk (0:2) we can invoke the command and specify the handleID.

vscsiStats -s -w 2718498 -i 11549

VMware vSphere Performance – Part Five: Optimizing Virtual Machine Resources

In previous articles I have described a number of steps to optimise the performance of ESXi host system resources, now we will look at details on how to optimise virtual machines with the available resources.

Memory Configuration

Through memory management techniques ESXi can  manage memory and is capable of reclaiming any excess memory assigned to virtual machines when the host system memory is exhausted. However, it is important for a number of reasons that you configure memory for the virtual machine to satisfy the workload of your virtual machine and do not over allocate memory resources, as this process of assigning excess memory can lead to a number of issues, for example:

  • Increase in the amount of available overhead memory to power on the virtual machine.
  • Increase in size of the virtual machine swap file (VSWP) results in increased disk usage.

By default, ESXi is configured to support the use of large pages. However, the guest operating system in some instances my require additional configuration in order to use large memory pages. For Example, for a Windows Server 2012 server with Microsoft SQL Server installed we would be required to configure the ‘Lock pages in memory’ privilege to the user account running the Microsoft SQL Server service so that the application will execute with the use of large memory pages.

Network Configuration

It is recommended that you use the VMXNET3 virtual network adapter for all supported guest operating systems that have VMware Tools installed. This virtual machine network adapter is optimised to provide higher throughput, lower latency and less overhead when compared to the other virtual machine network adapter options. The driver required for the VMXNET3 virtual adapter is not provided by the guest operating system and therefore requires VMware Tools installed to supply the driver.

VMXNET3, the newest generation of virtual network adapter from VMware, offers performance on par with or better than its previous generations in both Windows and Linux guests. Both the driver and the device have been highly tuned to perform better on modern systems. Furthermore, VMXNET3 introduces new features and enhancements, such as TSO6 and RSS. TSO6 makes it especially useful for users deploying applications that deal with IPv6 traffic, while RSS is helpful for deployments requiring high scalability. All these features give VMXNET3 advantages that are not possible with previous generations of virtual network adapters. Moving forward, to keep pace with an ever‐increasing demand for network bandwidth, we recommend customers migrate to VMXNET3 if performance is of top concern to their deployments.

CPU Configuration 

The CPU scheduler in ESXi schedules CPU activity and fairly grants CPU access to virtual machines using shares it is important to configure a virtual machine with required number of vCPUs required for the workload. For application workloads that are unknown , it is my recommendation to start with the approach of small and increase the number of vCPUs gradually until you notice an acceptable and stable performance from the virtual machine workload. By enabling CPU Hotplug for the virtual machine where supported by the guest operating system and/or application this allows for the flexibility of additional vCPUS to be added without incurring downtime for the virtual machine.

In the example of overcommitting CPU resources for a virtual machine the process of assigning the additional vCPU to the virtual machine can lead to performance issues as well as directly consuming additional memory for the associated virtual machine overhead. The additional CPU resources may cause for the host system to be exchausted and for the virtual machines on this host system to degrade in performance.  Therefore, adding additional vCPUs to the virtual machine to resolve a perceived vCPU contention issue may actually add an extra burden to the host system and degrade performance further.

By default the VMkernel schedules virtual machines vCPU to run on any logical CPU for the host systems hardware. In some cases you may wish to schedule CPU affinity for the virtual machine. For Example, you may require to troubleshoot the performance of CPU workload if this was not sharing CPU resources with other workloads on the host system, where you do not have the ability to migrate the virtual machine to run on an isolated host system. Also, you may wish to use CPU scheduling to measure throughput and response times of several virtual machines competing for CPU resources  agaisnt specific logical CPUs on the host system.

One limitation of enabling CPU scheduling affinity for a virtual machine is that vMotion is not functional in this configuration and also if have virtual machine in a Distributed Resources Scheduler (DRS) cluster, the ability to enable CPU scheduling affinity is disabled.

Storage Configuration 

The placement of a virtual machine on a datastore may have a significant impact on the performance of that virtual machine, this is due to I/O requirements for all virtual machines on that shared resource may result in I/O latency if the underlying storage array is unable to meet the requirements. In the event of optimising the placement of virtual machine I/O you may utilise Storage vMotion in order to migrate virtual machines to datastores that have fewer competing workloads or that are configured with better performance by triggering a I/O latency threashold.

When provisioning a virtual machine the  default virtual SCSI controller type is based on the guest operating system and in most cases will be adequate for the virtual machine workload. However, if this is not sufficient to satisfy the virtual machine workload using a VMware Paravirtual SCSI controller can improve performance with higher achievable throughput and lower CPU utilization in comparison to other SCSI controller types.  As per the VMXNET3 virtual network adapter, this requires VMware Tools installed to provide the appropriate drivers for the supported guest operating system.

As discussed in a previous performance blog (http://wp.me/p15Mdc-u7) by configuring the virtual machine disk type as Eager-zeroed thick provides the best performance virtual machine disk type as they do not require to obtain new physical disk blocks or write zeros during normal operations.

If you require to remove the VMFS layer to satisfy I/O requirements for performance in some instances you may configure a virtual machine to use raw device mappings (RDM).

 

VMware vSphere Performance – Part Four: Optimizing ESXi Host Storage

It is important to ensure the maximum queue depth of the HBA is configured to meet the manufacturer and VMware recommendations, which may dependant on the combination of ESXi version and the HBA model and version.

For QLogic HBAs, the default queue depth values are configured as below:

DefaultAQLEN

 

 

 

 

To view a list of loaded modules on the host system, invoke the following from the vSphere Command Line interface:

esxcli system module list

In my example the module loaded for the HBA is a using QLogic native drivers so I can search the output for a match for the module name:

esxcli system module list | grep qln

In order to determine the current queue depth for a HBA, we will need to identity the device name of the HBA by selecting a host in the vSphere Web Client and browsing to Manage > Storage > Storage Adapters.

HBAdevices

 

Once we have identified the device name of the HBA we will use esxtop from the ESXi shell to determine the current queue depth.

1) Press ‘d’ to display statistics for storage adapters.

2) Press ‘f’ to display the available fields and press ‘D’ to select ‘Queue_Stats’ (AQLEN) and press enter to return the the statistics.

QueueDepth


 

In order to configure the maximum queue depth we will require to retrieve the configuration parameter for the particular module loaded, in my example ‘qlnativefc’

esxcli system module parameters list -m qlnativefc

From the output of the parameters we can determine the the parameter that we require to modify is ‘ql2xmaxqdepth’ and can be performed as below, to set the maximum queue depth value to be ’32’.

esxcli system module parameters set -p ql2xmaxqdepth=32 -m qlnativefc

It is recommended to maintain configuration settings for all the host systems HBAs in a cluster, therefore by making the above change on one host system you should ensure that this is implemented on all host systems containing identical HBAs in the cluster. In the event that the host systems in a cluster are of mixed HBAs it is recommended to configure the maximum queue depth value is uniform accross all hosts.

 

VMware vSphere Performance – Part Three: Optimizing ESXi Host CPU

In order to support ESXi on a host system you require a minimum of two CPU cores, but ultimately need to ensure that your host has sufficient CPU resources in order to satisfy CPU demand of the virtual machines and VMkernal. It is also recommended to use CPUs that leverage hardware assisted virtualization as the performance of virtual machines can be significantly improved as the hardware will trap sensitive events and instructions at the hardware offloading the workload from the hypervisor.

HardwareAssistedVirtualization

 

 

 

 

 

First generation enhancements include Intel Virtualization Technology (VT-x) and AMD’s AMD-V which both target privileged instructions with a new CPU execution mode feature that allows the VMM to run in a new root mode below ring 0. As depicted in Figure 7, privileged and sensitive calls are set to automatically trap to the hypervisor, removing the need for either binary translation or paravirtualization. The guest state is stored in Virtual Machine Control Structures (VT-x) or Virtual Machine Control Blocks (AMD-V).

Due to high hypervisor to guest transition overhead and a rigid programming model, VMware’s binary translation approach currently outperforms first generation hardware assist implementations in most circumstances. The rigid programming model in the first generation implementation leaves little room for software flexibility in managing either the frequency or the cost of hypervisor to guest transitions1. Because of this, VMware only takes advantage of these first generation hardware features in limited cases such as for 64-bit guest support on Intel processors.

As well as enabling hardware assisted virtualization (Intel VT-x or AMD-V), it is also recommended to enable the following settings in the BIOS:

  • Ensure all installed CPU sockets and cores are enabled.
  • Intel Turbo Boost – Allows for the CPU to run faster than its thermal design power (TDP) configuration specified frequency when requested by the hypervisor and the CPU is operating below its power, current and temperature limits.
  • Hyperthreading – Allows for two independent threads to run concurrently on a single core.

By default, if hyperthreading is enabled in the BIOS, the ESXi host system will automatically use hyperthreading. However, the default behaviour can be modified on the host system in the vSphere Web client by select a host system and browsing to Manage > Settings > Hardware > Processors  and select Edit and uncheck the hyperthreading enabled option to which the host system will require a restart to apply the change.

DisableHT

 

 

 

 

To disable ESXi host system from using hyperthreading using PowerCLI, we can invoke the Get-View cmdlet to retrieve the CpuScheduler and initiate the Disable Hyper Threading task.

$HostSystem = "esxi1host.domain.local"

$CpuScheduler = Get-View (Get-View -ViewType HostSystem -Property ConfigManager.CpuScheduler -Filter @{"Name" = $HostSystem}).ConfigManager.CpuScheduler

$CpuScheduler.DisableHyperThreading()

It is also recommended to disable any hardware devices that will not be used in the BIOS to prevent CPU cycles being used by devices such as a serial port.

VMware vSphere Performance – Part Two: Optimizing ESXi Host Networking

An ESXi host system requires a minimum of one network interface card (NIC), for redundancy a host system should be configured with a minimum of at least two NICs in order to satisfy VMkernal and virtual machine demand. For virtual machine communication on a host , these tasks will consume CPU resources and there for will require sufficient CPU resources to also be available for concurrent VMkernal and vMotion network activity.

In some instances (as discussed in http://wp.me/p15Mdc-ua) Direct I/O may be configured to ensure high throughput for a virtual machine workload, which requires the host system to have hardware-assisted virtualization (Intel VT-d or AMD-Vi) enabled in the BIOS.

A host system can use multiple physical CPUs to process network packets from a single network adapter by enabling SplitRx mode, which can improve performance of specific workloads in particular multicast traffic. By default, the feature is automatically enabled on VMXNET3 virtual network adapters , where the host system will detect a single network queue on a physical NIC which is heavily utilised and servicing eight virtual machine network adapters with evenly distributed loads.

SplitRx mode can be enabled on the host system by modifying the ‘Net.NetSplitRxMode’ configuration parameter, by default SplitRx mode is enabled with the value of ‘1’.  To disable SplitRx mode using the v Sphere Web Client select a host system and browse to Manage > Settings > System > Advanced Host System Settings and edit the value of the configuration parameter  ‘Net.NetSplitRxMode’ value to be ‘0’.

Alternatively, we can retrieve the current value of the configuration parameter using the ‘Get-AdvancedSetting’ cmdlet and configure the value using the ‘Set-AdvancedSetting’ cmdlet

Get-AdvancedSetting -Entity esxi1host.domain.local -Name Net.NetSplitRXMode | Select Entity, Name, Value
Get-AdvancedSetting -Entity esxi1host.domain.local -Name Net.NetSplitRXMode | Set-AdvancedSetting -Value 0 -Confirm:$False