vStorage APIs for Array Integration

The Linux SCSI Target Wiki

(Difference between revisions)
Jump to: navigation, search
m
m
 
(40 intermediate revisions not shown)
Line 1: Line 1:
{{DISPLAYTITLE:vStorage APIs for Array Integration}}
{{DISPLAYTITLE:vStorage APIs for Array Integration}}
{{Infobox software
{{Infobox software
-
| name                  = LIO Target
+
| name                  = VAAI
-
| logo                  = [[Image:RisingTide_Logo_small.png|180px|Logo]]
+
| logo                  = [[Image:Corp_Logo.png|180px|Logo]]
-
| screenshot            = {{RTS screenshot|Target}}
+
| screenshot            = vStorage APIs for Array Integration
-
| caption                = LIO Unified Target
+
| caption                =  
| collapsible            =  
| collapsible            =  
| author                = {{Nicholas Bellinger}}
| author                = {{Nicholas Bellinger}}
Line 21: Line 21:
| size                  =
| size                  =
| language              =
| language              =
-
| genre                  = Target engine
+
| genre                  = T10 SCSI feature
-
| license                = Proprietary commercial software
+
| license                = GNU General Public License
| website                = {{RTS website}}
| website                = {{RTS website}}
}}
}}
:''See [[Target]] for a complete overview over all fabric modules.''
:''See [[Target]] for a complete overview over all fabric modules.''
-
The [[VMware]] '''vStorage APIs for Array Integration''' ('''VAAI''') enable seamless offload of locking and block operations onto the storage array. VAAI is supported in the LIO Enterprise Edition as used in [[RTS OS]].
+
The [[VMware]] '''vStorage APIs for Array Integration''' ('''VAAI''') enable seamless offload of locking and block operations onto the storage array.
== Overview ==
== Overview ==
-
[[VMware]] introduced the vStorage APIs for Array Integration (VAAI) in [[VMware vSphere|vSphere]] 4.1 with a plugin, and provided native VAAI support with vSphere 5. VAAI significantly enhances the seamless integration of storage and servers.
+
[[VMware]] introduced the vStorage APIs for Array Integration (VAAI) in [[VMware vSphere|vSphere]] 4.1 with a plugin, and provided native VAAI support with vSphere 5. VAAI significantly enhances the integration of storage and servers by enabling seamless offload of locking and block operations onto the storage array. The {{Target}} provides native VAAI support for vSphere 5.
== Features ==
== Features ==
 +
 +
{{T}} and [[{{OS}}]] support the following VAAI functions:
{| class="table-left"
{| class="table-left"
Line 42: Line 44:
! Block
! Block
! NFS
! NFS
-
! [[RTS OS]]
+
! [[{{OS}}]]
|-
|-
| Atomic Test & Set (ATS)
| Atomic Test & Set (ATS)
Line 73: Line 75:
|}
|}
-
The presence of VAAI and its features can be verified from the ESX 5 CLI as follows:
+
The presence of VAAI and its features can be verified from the VMware ESX 5 CLI as follows:
<pre>
<pre>
Line 84: Line 86:
   Delete Status: unsupported
   Delete Status: unsupported
</pre>
</pre>
 +
 +
{{Ambox| type=note| head=Delete is disabled| text=Delete is disabled per default, see [[#Delete|below]] for more details.}}
== Primitives ==
== Primitives ==
Line 89: Line 93:
=== ATS ===
=== ATS ===
-
ATS is arguably one of the most valuable storage technologies to come out of VMware. It enables locking of block storage devices at much finer granularity than with the preceding T10 [[Persistent Reservations]], which can only operate on full LUNs. Hence, ATS allows a significant performance gain for shared LUNs. For instance, HP reported that it can support six times more VMs per LUN with VAAI than without it.
+
ATS is arguably one of the most valuable storage technologies to come out of VMware. It enables locking of block storage devices at much finer granularity than with the preceding T10 [[Persistent Reservations]], which can only operate on full LUNs. Hence, ATS allows more concurrency and thus significantly higher performance for shared LUNs.
 +
 
 +
For instance, Hewlett-Packard reported that it can support six times more VMs per LUN with VAAI than without it.
-
''ATS'' uses the T10 <code>COMPARE_AND_WRITE</code> command (or a proprietary alternative enabled with a vendor-specific VMware plug-in) to allow comparing and writing SCSI blocks in one atomic operation.
+
''ATS'' uses the T10 <code>COMPARE_AND_WRITE</code> command to allow comparing and writing SCSI blocks in one atomic operation.
NFS doesn’t need ATS, as locking is a non-issue and VM files aren’t shared the same way LUNs are.
NFS doesn’t need ATS, as locking is a non-issue and VM files aren’t shared the same way LUNs are.
-
Feature presence can be verified from the ESX&nbsp;5 CLI:
+
Feature presence can be verified from the VMware ESX&nbsp;5 CLI:
<pre>
<pre>
Line 102: Line 108:
</pre>
</pre>
-
The use of ATS by VMware depends on the filesystem type:
+
VMware actually uses ATS depending on the underlying filesystem type and history:
{| class="table-left"
{| class="table-left"
Line 112: Line 118:
|-
|-
| Single-extent datastore reservations
| Single-extent datastore reservations
-
| ATS only<ref group="Caveats">If a new VMFS-5 is created on a non-ATS storage device, SCSI-2 reservations will be used.</ref>
+
| ATS only<ref group="ATS">If a new VMFS-5 is created on a non-ATS storage device, SCSI-2 reservations will be used.</ref>
| ATS but fall back to SCSI-2 reservations
| ATS but fall back to SCSI-2 reservations
| ATS but fall back to SCSI-2 reservations
| ATS but fall back to SCSI-2 reservations
|-
|-
| Multi-extent datastore when locks on non-head
| Multi-extent datastore when locks on non-head
-
| Only allow spanning on ATS hardware<ref group="Caveats">When creating a multi-extent datastore where ATS is used, the vCenter Server will filter out non-ATS devices, so that only devices that support the ATS primitive can be used.</ref>
+
| Only allow spanning on ATS hardware<ref group="ATS">When creating a multi-extent datastore where ATS is used, the vCenter Server will filter out non-ATS devices, so that only devices that support the ATS primitive can be used.</ref>
| ATS except when locks on non-head
| ATS except when locks on non-head
| ATS except when locks on non-head
| ATS except when locks on non-head
|}
|}
-
<references group="Caveats"/>
+
<references group="ATS"/>
=== Zero ===
=== Zero ===
Line 128: Line 134:
Thin provisioning is difficult to get right because storage arrays don't know what’s going on in the hosts. VAAI includes a generic interface for communicating free space, thus allowing large ranges of blocks to be zeroed out at once.  
Thin provisioning is difficult to get right because storage arrays don't know what’s going on in the hosts. VAAI includes a generic interface for communicating free space, thus allowing large ranges of blocks to be zeroed out at once.  
-
''Zero'' uses the T10 <code>WRITE_SAME</code> command (or a proprietary alternative enabled with a vendor-specific VMware plug-in) and defaults to a 1&nbsp;MB block size. VMware can use <code>WRITE_SAME</code> in conjunction with the T10 <code>UNMAP</code> command. Zeroing only works for capacity inside a VMDK.
+
''Zero'' uses the T10 <code>WRITE_SAME</code> command, and defaults to a 1&nbsp;MB block size. Zeroing only works for capacity inside a VMDK. vSphere&nbsp;5 can use <code>WRITE_SAME</code> in conjunction with the T10 <code>UNMAP</code> command.
-
Feature presence can be verified from the ESX&nbsp;5 CLI:
+
Feature presence can be verified from the VMware ESX&nbsp;5 CLI:
<pre>
<pre>
Line 148: Line 154:
=== Clone ===
=== Clone ===
-
This is the signature VAAI command. Instead of reading each block of data from the array then writing it back, the hypervisor can command the array to duplicate a range of data on its behalf. If supported and enabled, VMware operations like Clone and VMotion can become extremely fast. Speedups of a factor of 10 or more are achievable, particularly on fast flash-based backstores over slow network links, such as 1&nbsp;GbE.
+
This is the signature VAAI command. Instead of reading each block of data from the array and then writing it back, the ESX hypervisor can command the array to duplicate a range of data on its behalf. If ''Clone'' is supported and enabled, VMware operations like VM cloning and VM vMotion can become very fast. Speed-ups of a factor of ten or more are achievable, particularly on fast flash-based backstores over slow network links, such as 1&nbsp;GbE.
-
''Clone'' uses the T10 <code>EXTENDED_COPY</code> command (or a proprietary alternative enabled with a vendor-specific VMware plug-in) and defaults to a 4&nbsp;MB block size.
+
''Clone'' uses the T10 <code>EXTENDED_COPY</code> command, and defaults to a 4&nbsp;MB block size.
-
Feature presence can be verified from the ESX&nbsp;5 CLI:
+
Feature presence can be verified from the VMware ESX&nbsp;5 CLI:
<pre>
<pre>
Line 170: Line 176:
=== Delete ===
=== Delete ===
-
VMFS operations like cloning and vMotion didn’t include any hints to the array to clear out unused VMFS space. Hence, some of the biggest storage operations couldn't be accelerated or "thinned out".
+
VMFS operations like cloning and vMotion didn’t include any hints to the storage array to clear unused VMFS space. Hence, some of the biggest storage operations couldn't be accelerated or "thinned out".
-
''Delete'' uses the T10 <code>UNMAP</code> command (or a proprietary alternative enabled with a vendor-specific VMware plug-in) to allow thin-capable arrays to offload clearing unused VMFS space.
+
''Delete'' uses the T10 <code>UNMAP</code> command to allow thin-capable arrays to offload clearing unused VMFS space.
However, vCenter&nbsp;5 doesn't correctly handle waiting for the storage array to return the <code>UNMAP</code> command status, so the use of ''Delete'' is disabled per default in vSphere&nbsp;5.
However, vCenter&nbsp;5 doesn't correctly handle waiting for the storage array to return the <code>UNMAP</code> command status, so the use of ''Delete'' is disabled per default in vSphere&nbsp;5.
-
Feature presence can be verified from the ESX&nbsp;5 CLI:
+
Feature presence can be verified from the VMware ESX&nbsp;5 CLI (the default value is '0'):
<pre>
<pre>
-
~ # esxcfg-advcfg -g /VMFS3/EnableBlockDelete
+
~ # esxcli --server=<my_esx_server> -u root system settings advanced list --option /VMFS3/EnableBlockDelete
-
Value of EnableBlockDelete is 1
+
Type: integer
 +
Int Value: 0
 +
Default Int Value: 0
 +
Min Value: 0
 +
Max Value: 1
 +
String Value:
 +
Default String Value:
 +
Valid Characters:
 +
Description: Enable VMFS block delete
</pre>
</pre>
-
To disable ''Delete'' from the ESX&nbsp;5 CLI:
+
To enable ''Delete'' from the ESX&nbsp;5 CLI:<ref>{{cite book| title=ESXi and vCenter Server 5 Documentation| chapter=Disable Space Reclamation| chapterurl=http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc_50%2FGUID-EB15AF57-0AD2-414D-BE5D-3AAF4623133D.html| last=VMware| publisher=vmware.com| location=Palo Alto| date=2011-11-10| accessdate=2012-12-26}}</ref>
<pre>
<pre>
-
~ # esxcfg-advcfg -s 0 /VMFS3/EnableBlockDelete
+
~ # esxcli --server=<my_esx_server> -u root system settings advanced set --int-value 1 --option /VMFS3/EnableBlockDelete
-
Value of EnableBlockDelete is 0
+
Value of EnableBlockDelete is 1
</pre>
</pre>
-
Many SATA SSDs also have issues handling <code>UNMAP</code> properly, so it's disabled in RTS OS per default. To enable <code>UNMAP</code> from [[targetcli]], enter the context of the respective backstore device, and set the following attribute:
+
Many SATA SSDs have issues handling <code>UNMAP</code> properly, so it is disabled per default in {{OS}}.
 +
 
 +
To enable <code>UNMAP</code>, start the ''[[targetcli]]'' shell, enter the context of the respective backstore device, and set the ''emulate_tpu'' attribute:
<pre>
<pre>
Line 198: Line 214:
</pre>
</pre>
-
Reboot the ESX host, or re-login into the backstore, in order to get ''Delete'' as recognized, then verify its presence from the ESX&nbsp;5 CLI:
+
Reboot the ESX host or re-login into the backstore in order to get ''Delete'' as recognized, then verify its presence from the VMware ESX&nbsp;5 CLI:
<pre>
<pre>
Line 210: Line 226:
</pre>
</pre>
-
{{Message/warning|Qualified backstore devices.|RTS has successfully tested UNMAP only on Fusion-IO ioDrive PCIe flash cards so far.}}
+
{{Ambox| type=note| head=Qualified backstore devices| text={{RTS short}} has successfully tested UNMAP only on Fusion-IO ioDrive PCIe flash cards so far.}}
== Performance ==
== Performance ==
-
{{Image|VMware Clone with VAAI.png|Cloning [[VMware]] VMs in 25s over 1&nbsp;GbE on an RTS&nbsp;OS SAN with VAAI and Fusion-IO ioDrive PCIe flash memory.}}
+
{{Image|VMware Clone with VAAI.png|Cloning [[VMware]] VMs in 25s over 1&nbsp;GbE on an {{OS}} SAN with VAAI and Fusion-IO ioDrive PCIe flash memory.}}
Performance improvements offered by VAAI can be grouped into three categories:
Performance improvements offered by VAAI can be grouped into three categories:
-
* Reduced time to complete VM cloning and Block Zeroing operations
+
* Reduced time to complete VM cloning and Block Zeroing operations.
-
* Reduced use of server compute and storage network resources
+
* Reduced use of server compute and storage network resources.
-
* Improved scalability of VMFS datastores in terms of the number of VMs per datastore and the number of ESX servers attached to a datastore
+
* Improved scalability of VMFS datastores in terms of the number of VMs per datastore and the number of ESX servers attached to a datastore.
The actual improvement seen in any given environment depends on a number of factors, discussed in the following section. In some environments, improvement may be small.
The actual improvement seen in any given environment depends on a number of factors, discussed in the following section. In some environments, improvement may be small.
Line 225: Line 241:
=== Cloning, migrating and zeroing VMs ===
=== Cloning, migrating and zeroing VMs ===
-
The biggest factor for Cloning and Block Zeroing operations is whether the limiting factor is on the front end or the back end of the storage controller. If the throughput of the storage network is slower than the backstore can handle, offloading the bulk work of reading and writing virtual disks for cloning and migration and writings zeroes for virtual disk initialization can help immensely.
+
The biggest factor for Full Copy and Block Zeroing operations is whether the limiting factor is on the front end or the back end of the storage controller. If the throughput of the storage network is slower than the backstore can handle, offloading the bulk work of reading and writing virtual disks for cloning and migration and writings zeroes for virtual disk initialization can help immensely.
-
One example where substantial improvement is likely is when the ESX servers use 1&nbsp;GbE iSCSI to connect to an [[RTS OS]] storage system with flash memory. The front end at 1&nbsp;Gbps doesn't support enough throughput to saturate the back end. When cloning or zeroing is offloaded, however, only small commands with small payload go across the front, while the actual I/O is completed by the storage controller itself directly to disk.
+
One example where substantial improvement is likely is when the ESX servers use 1&nbsp;GbE iSCSI to connect to an {{OS}} storage system with flash memory. The front end at 1&nbsp;Gbps doesn't support enough throughput to saturate the back end. When cloning or zeroing is offloaded, however, only small commands with small payload go across the front, while the actual I/O is completed by the storage controller itself directly to disk.
=== VMFS datastore scalability ===
=== VMFS datastore scalability ===
Line 235: Line 251:
When all ESX servers sharing a datastore support VAAI, ATS can eliminate SCSI Persistent Reservations, at least reservations due to obtaining smaller locks. The result is that datastores can be scaled to more VMs and attached servers than previously.
When all ESX servers sharing a datastore support VAAI, ATS can eliminate SCSI Persistent Reservations, at least reservations due to obtaining smaller locks. The result is that datastores can be scaled to more VMs and attached servers than previously.
-
RTS has tested up to 128 VMs in a single VMFS datastore on RTS OS. The number of VMs was limited in testing to 128 because the maximum addressable LUN size in ESX is 2&nbsp;TB, which means that each VM can occupy a maximum of 16&nbsp;GB, including virtual disk, virtual swap, and any other files. Virtual disks much smaller than this generally do not allow enough space to be practical for an OS and any application.
+
{{RTS short}} has tested up to 128 VMs in a single VMFS datastore on {{OS}}. The number of VMs was limited in testing to 128 because the maximum addressable LUN size in ESX is 2&nbsp;TB, which means that each VM can occupy a maximum of 16&nbsp;GB, including virtual disk, virtual swap, and any other files. Virtual disks much smaller than this generally do not allow enough space to be practical for an OS and any application.
-
Load was generated and measured on the VMs by using iometer. For some tests, all VMs had load. In others, such as when sets of VMs were started, stopped, or suspended, load was placed only on VMs that stayed running.
+
Load was generated and measured on the VMs by using ''iometer''. For some tests, all VMs had load. In others, such as when sets of VMs were started, stopped, or suspended, load was placed only on VMs that stayed running.
-
Tests such as starting, stopping, and suspending numbers of VMs were run with Iometer workloads running on other VMs that weren't being started, stopped, or suspended. Additional tests were run with all VMs running Iometer, and VMware snapshots were created and deleted as quickly as possible on all or some large subset of the VMs.
+
Tests such as starting, stopping, and suspending numbers of VMs were run with ''iometer'' workloads running on other VMs that weren't being started, stopped, or suspended. Additional tests were run with all VMs running ''iometer'', and VMware snapshots were created and deleted as quickly as possible on all or some large subset of the VMs.
The results of these tests demonstrated that performance impact measured before or without VAAI was either eliminated or substantially reduced when using VAAI, to the point that datastores could reliably be scaled to 128 VMs in a single LUN.
The results of these tests demonstrated that performance impact measured before or without VAAI was either eliminated or substantially reduced when using VAAI, to the point that datastores could reliably be scaled to 128 VMs in a single LUN.
Line 245: Line 261:
== Statistics ==
== Statistics ==
-
The ''esxtop'' command in ESX&nbsp;5 has two new sets of counters for VAAI operations available under the disk device view. Both sets of counters include the three VAAI key primitives. To view VAAI statistics using ''esxtop'', follow these steps from the ESX&nbsp;5 CLI:
+
The VMware ''esxtop'' command in ESX&nbsp;5 has two new sets of counters for VAAI operations available under the disk device view. Both sets of counters include the three VAAI key primitives. To view VAAI statistics using ''esxtop'', follow these steps from the ESX&nbsp;5 CLI:
<pre>
<pre>
Line 281: Line 297:
|-
|-
| DEVICE
| DEVICE
-
| Devices that support VAAI (LUNs on a supported storage system) are listed by their NAA ID. You can get the NAA ID for a datastore from the datastore properties in vCenter, the Storage Details—SAN view in Virtual Storage Console, or using the ''vmkfstools -P /vmfs/volumes/<datastore>'' command. LIO/RTS OS LUNs start with naa.6001405.<br/>
+
| Devices that support VAAI (LUNs on a supported storage system) are listed by their NAA ID. You can get the NAA ID for a datastore from the datastore properties in vCenter, the Storage Details—SAN view in Virtual Storage Console, or using the ''vmkfstools -P /vmfs/volumes/<datastore>'' command. {{T}} LUNs start with naa.6001405.<br/>
'''Note''': Devices or datastores other than LUNs on an external storage system such as CD-ROM, internal disks (which may be physical disks or LUNs on internal RAID controllers), and NFS datastores are listed but have all zeroes for VAAI counters.
'''Note''': Devices or datastores other than LUNs on an external storage system such as CD-ROM, internal disks (which may be physical disks or LUNs on internal RAID controllers), and NFS datastores are listed but have all zeroes for VAAI counters.
|-
|-
Line 320: Line 336:
== See also ==
== See also ==
-
* [[RTS OS]], [[targetcli]]
+
* {{Target}}, [[targetcli]]
-
* [[Target]]
+
* [[{{OS}}]]
-
* [[Fibre Channel]], [[Fibre Channel over Ethernet|FCoE]], [[InfiniBand]], [[iSCSI]], [[vHost]]
+
* [[FCoE]], [[Fibre Channel]], [[iSCSI]], [[iSER]], [[SRP]], [[vHost]]
* [[Persistent Reservations]], [[ALUA]]
* [[Persistent Reservations]], [[ALUA]]
 +
 +
== References ==
 +
{{reflist}}
== External links ==
== External links ==
-
* [[RTS OS]] [http://www.risingtidesystems.com/doc/RTS%20OS%20Admin%20Manual%20CE.pdf Admin Manual]
+
* {{LIO Admin Manual}}
-
* RTSlib Reference Guide [[http://www.risingtidesystems.com/doc/rtslib-gpl/html/ HTML]][[http://www.risingtidesystems.com/doc/rtslib-gpl/pdf/rtslib-API-reference.pdf PDF]]
+
* RTSlib Reference Guide {{Lib Ref Guide HTML}}{{Lib Ref Guide PDF}}
* {{cite video |title=RTS OS VAAI video |publisher=YouTube |url=http://www.youtube.com/watch?v=ktexAFlYUho}}
* {{cite video |title=RTS OS VAAI video |publisher=YouTube |url=http://www.youtube.com/watch?v=ktexAFlYUho}}
-
* {{cite web |url=http://blog.fosketts.net/2011/11/10/complete-list-vmware-vaai-primitives/| title=A Complete List of VMware VAAI Primitives| author=Stephen Foskett| publisher=blog.fosketts.net| date=11/10/2011}}
+
* {{cite web |url=http://blog.fosketts.net/2011/11/10/complete-list-vmware-vaai-primitives/| title=A Complete List of VMware VAAI Primitives| author=Stephen Foskett| publisher=blog.fosketts.net| date=2011-11-10}}
-
* {{cite web| url=http://www.virtuallanger.com/2011/12/06/vaai-is-this-thing-on/ |title=VAAI, Is This Thing On?? |author=Jason Langer |publisher=www.virtuallanger.com |date=12/6/2011}}
+
* {{cite web| url=http://www.virtuallanger.com/2011/12/06/vaai-is-this-thing-on/ |title=VAAI, Is This Thing On?? |author=Jason Langer |publisher=www.virtuallanger.com |date=2011-12-06}}
-
* {{cite book |url=http://media.netapp.com/documents/tr-3886.pdf |title=Understanding and Using vStorage APIs for Array Integration and NetApp Storage |series=TR-3886 |author=Peter Learmonth |publisher=NetApp |location=Sunnyvale |date=11/2010}}
+
* {{cite book |url=http://media.netapp.com/documents/tr-3886.pdf |title=Understanding and Using vStorage APIs for Array Integration and NetApp Storage |series=TR-3886 |author=Peter Learmonth |publisher=NetApp |location=Sunnyvale |date=November 2010}}
-
* {{cite web |url=http://linux.sys-con.com/node/1966586 |title=vSphere 5, VAAI and the Death of the Traditional Storage Array |author=Archie Hendryx |publisher=Sys-Con |date=9/4/2011}}
+
* {{cite web |url=http://linux.sys-con.com/node/1966586 |title=vSphere 5, VAAI and the Death of the Traditional Storage Array |author=Archie Hendryx |publisher=Sys-Con |date=2011-09-04}}
-
* {{cite web |url=http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021976 |title=vStorage APIs for Array Integration FAQ |publisher=VMware |location=Palo Alto |date=6/18/2012}}
+
* {{cite web |url=http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1021976 |title=vStorage APIs for Array Integration FAQ |publisher=VMware |location=Palo Alto |date=2012-06-18}}
[[Category:Hypervisors]]
[[Category:Hypervisors]]

Latest revision as of 02:44, 7 August 2015

VAAI
Logo
vStorage APIs for Array Integration
Original author(s) Nicholas Bellinger
Developer(s) Datera, Inc.
Development status Production
Written in C
Operating system Linux
Type T10 SCSI feature
License GNU General Public License
Website datera.io
See Target for a complete overview over all fabric modules.

The VMware vStorage APIs for Array Integration (VAAI) enable seamless offload of locking and block operations onto the storage array.

Contents

Overview

VMware introduced the vStorage APIs for Array Integration (VAAI) in vSphere 4.1 with a plugin, and provided native VAAI support with vSphere 5. VAAI significantly enhances the integration of storage and servers by enabling seamless offload of locking and block operations onto the storage array. The LinuxIO provides native VAAI support for vSphere 5.

Features

LIO and LIO support the following VAAI functions:

Name Primitive Description Block NFS LIO
Atomic Test & Set (ATS) Hardware Assisted Locking
COMPARE_AND_WRITE
Enables granular locking of block storage devices, accelerating performance. Yes N/A Yes
Zero Block Zeroing
WRITE_SAME
Communication mechanism for thin provisioning arrays. Used when creating VMDKs. Yes N/A Yes
Clone Full Copy, XCopy
EXTENDED_COPY
Commands the array to duplicate data in a LUN. Used for Clone and VMotion operations. Yes N/A Yes
Delete Space Reclamation
UNMAP
Allow thin provisioned arrays to clear unused VMFS space. Yes Yes Yes
Disabled

The presence of VAAI and its features can be verified from the VMware ESX 5 CLI as follows:

~ # esxcli storage core device vaai status get
naa.6001405a2e547c17329487b865d1a66e
   VAAI Plugin Name:
   ATS Status: supported
   Clone Status: supported
   Zero Status: supported
   Delete Status: unsupported

Primitives

ATS

ATS is arguably one of the most valuable storage technologies to come out of VMware. It enables locking of block storage devices at much finer granularity than with the preceding T10 Persistent Reservations, which can only operate on full LUNs. Hence, ATS allows more concurrency and thus significantly higher performance for shared LUNs.

For instance, Hewlett-Packard reported that it can support six times more VMs per LUN with VAAI than without it.

ATS uses the T10 COMPARE_AND_WRITE command to allow comparing and writing SCSI blocks in one atomic operation.

NFS doesn’t need ATS, as locking is a non-issue and VM files aren’t shared the same way LUNs are.

Feature presence can be verified from the VMware ESX 5 CLI:

~ # esxcfg-advcfg -g /VMFS3/HardwareAcceleratedLocking
Value of HardwareAcceleratedLocking is 1

VMware actually uses ATS depending on the underlying filesystem type and history:

On VAAI Hardware New VMFS-5 Upgraded VMFS-5 VMFS-3
Single-extent datastore reservations ATS only[ATS 1] ATS but fall back to SCSI-2 reservations ATS but fall back to SCSI-2 reservations
Multi-extent datastore when locks on non-head Only allow spanning on ATS hardware[ATS 2] ATS except when locks on non-head ATS except when locks on non-head
  1. If a new VMFS-5 is created on a non-ATS storage device, SCSI-2 reservations will be used.
  2. When creating a multi-extent datastore where ATS is used, the vCenter Server will filter out non-ATS devices, so that only devices that support the ATS primitive can be used.

Zero

Thin provisioning is difficult to get right because storage arrays don't know what’s going on in the hosts. VAAI includes a generic interface for communicating free space, thus allowing large ranges of blocks to be zeroed out at once.

Zero uses the T10 WRITE_SAME command, and defaults to a 1 MB block size. Zeroing only works for capacity inside a VMDK. vSphere 5 can use WRITE_SAME in conjunction with the T10 UNMAP command.

Feature presence can be verified from the VMware ESX 5 CLI:

~ # esxcfg-advcfg -g /DataMover/HardwareAcceleratedInit
Value of HardwareAcceleratedInit is 1

To disable Zero from the ESX 5 CLI:

~ # esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedInit
Value of HardwareAcceleratedInit is 0

This change takes immediate effect, without requiring a 'Rescan All' from VMware.

Clone

This is the signature VAAI command. Instead of reading each block of data from the array and then writing it back, the ESX hypervisor can command the array to duplicate a range of data on its behalf. If Clone is supported and enabled, VMware operations like VM cloning and VM vMotion can become very fast. Speed-ups of a factor of ten or more are achievable, particularly on fast flash-based backstores over slow network links, such as 1 GbE.

Clone uses the T10 EXTENDED_COPY command, and defaults to a 4 MB block size.

Feature presence can be verified from the VMware ESX 5 CLI:

~ # esxcfg-advcfg -g /DataMover/HardwareAcceleratedMove
Value of HardwareAcceleratedMove is 1

To disable Clone from the ESX 5 CLI:

~ # esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedMove
Value of HardwareAcceleratedMove is 0

This change takes immediate effect, without requiring a 'Rescan All' from VMware.

Delete

VMFS operations like cloning and vMotion didn’t include any hints to the storage array to clear unused VMFS space. Hence, some of the biggest storage operations couldn't be accelerated or "thinned out".

Delete uses the T10 UNMAP command to allow thin-capable arrays to offload clearing unused VMFS space.

However, vCenter 5 doesn't correctly handle waiting for the storage array to return the UNMAP command status, so the use of Delete is disabled per default in vSphere 5.

Feature presence can be verified from the VMware ESX 5 CLI (the default value is '0'):

~ # esxcli --server=<my_esx_server> -u root system settings advanced list --option /VMFS3/EnableBlockDelete
Type: integer
Int Value: 0
Default Int Value: 0
Min Value: 0
Max Value: 1
String Value:
Default String Value:
Valid Characters:
Description: Enable VMFS block delete

To enable Delete from the ESX 5 CLI:[1]

~ # esxcli --server=<my_esx_server> -u root system settings advanced set --int-value 1 --option /VMFS3/EnableBlockDelete
Value of EnableBlockDelete is 1

Many SATA SSDs have issues handling UNMAP properly, so it is disabled per default in LIO.

To enable UNMAP, start the targetcli shell, enter the context of the respective backstore device, and set the emulate_tpu attribute:

/backstores/iblock/fioa> set attribute emulate_tpu=1
Parameter emulate_tpu is now '1'.
/backstores/iblock/fioa>

Reboot the ESX host or re-login into the backstore in order to get Delete as recognized, then verify its presence from the VMware ESX 5 CLI:

~ # esxcli storage core device vaai status get
naa.6001405a2e547c17329487b865d1a66e
   VAAI Plugin Name:
   ATS Status: supported
   Clone Status: supported
   Zero Status: supported
   Delete Status: supported

Performance

Cloning VMware VMs in 25s over 1 GbE on an LIO SAN with VAAI and Fusion-IO ioDrive PCIe flash memory.

Performance improvements offered by VAAI can be grouped into three categories:

The actual improvement seen in any given environment depends on a number of factors, discussed in the following section. In some environments, improvement may be small.

Cloning, migrating and zeroing VMs

The biggest factor for Full Copy and Block Zeroing operations is whether the limiting factor is on the front end or the back end of the storage controller. If the throughput of the storage network is slower than the backstore can handle, offloading the bulk work of reading and writing virtual disks for cloning and migration and writings zeroes for virtual disk initialization can help immensely.

One example where substantial improvement is likely is when the ESX servers use 1 GbE iSCSI to connect to an LIO storage system with flash memory. The front end at 1 Gbps doesn't support enough throughput to saturate the back end. When cloning or zeroing is offloaded, however, only small commands with small payload go across the front, while the actual I/O is completed by the storage controller itself directly to disk.

VMFS datastore scalability

Documentation from various sources, including VMware professional services best practices, has traditionally recommended 20 to 30 VMs per VMFS datastore, and sometimes even fewer. Documents for VMware Lab Manager suggest limiting the number of ESX servers in a cluster to eight. These recommended limits are due in part to the effect of SCSI reservations on performance and reliability. Extensive use of some features, such as VMware snapshots and linked clones, can trigger large numbers of VMFS metadata updates, which require locking. Before vSphere 4.1, reliable locks on smaller objects were obtained by briefly locking the entire LUN with a SCSI Persistent Reservations. Any other server trying to access the LUN during the reservation would fail and would wait and retry up to 80 times by default. This wait and retry added to perceived latency and reduced throughput in VMs. In extreme cases, if the other server exceeded the number of retries, errors would be logged in the VMkernel logs and I/Os could return as failures to the VM.

When all ESX servers sharing a datastore support VAAI, ATS can eliminate SCSI Persistent Reservations, at least reservations due to obtaining smaller locks. The result is that datastores can be scaled to more VMs and attached servers than previously.

Datera has tested up to 128 VMs in a single VMFS datastore on LIO. The number of VMs was limited in testing to 128 because the maximum addressable LUN size in ESX is 2 TB, which means that each VM can occupy a maximum of 16 GB, including virtual disk, virtual swap, and any other files. Virtual disks much smaller than this generally do not allow enough space to be practical for an OS and any application.

Load was generated and measured on the VMs by using iometer. For some tests, all VMs had load. In others, such as when sets of VMs were started, stopped, or suspended, load was placed only on VMs that stayed running.

Tests such as starting, stopping, and suspending numbers of VMs were run with iometer workloads running on other VMs that weren't being started, stopped, or suspended. Additional tests were run with all VMs running iometer, and VMware snapshots were created and deleted as quickly as possible on all or some large subset of the VMs.

The results of these tests demonstrated that performance impact measured before or without VAAI was either eliminated or substantially reduced when using VAAI, to the point that datastores could reliably be scaled to 128 VMs in a single LUN.

Statistics

The VMware esxtop command in ESX 5 has two new sets of counters for VAAI operations available under the disk device view. Both sets of counters include the three VAAI key primitives. To view VAAI statistics using esxtop, follow these steps from the ESX 5 CLI:

~ # esxtop
  1. Press 'u' to change to the disk device stats view.
  2. Press 'f' to select fields, or 'o' to change their order. Note: This selects sets of counters, not individual counters.
  3. Press 'o' to select VAAI Stats and/or 'p' to select VAAI Latency Stats.
  4. Optionally, deselect Queue Stats, I/O Stats, and Overall Latency Stats by pressing 'f', 'g', and 'i' respectively in order to simplify the display.
  5. To see the whole LUN field, widen it by pressing 'L' (capital) then entering a number ('36' is wide enough to see a full NAA ID of a LUN).

The output of esxtop looks similar to the following:

 4:46:50am up 44 min, 281 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.00, 0.00, 0.00

DEVICE                               CLONE_RD CLONE_WR  CLONE_F MBC_RD/s MBC_WR/s      ATS ATSF     ZERO   ZERO_F MBZERO/s   DELETE DELETE_F  MBDEL/s
naa.60014050e4485b9bdc841d09478888e6        0        0        0     0.00     0.00       23    0        0        0     0.00        0        0     0.00
naa.600140515743d5195b0498b8aad6fdd2     1583      792        0     0.00     0.00     1322    0       23        0     0.00        0        0     0.00
naa.60014053937c69d44ff4e0b9e5a95398        0        0        0     0.00     0.00        0    0        0        0     0.00        0        0     0.00
naa.60014055fcf891d0c5b4a60a66942400     4746     3955        0     0.00     0.00     4402    0       45        0     0.00        0        0     0.00
naa.600140573d94f8e531d4d1ab5c8a72ef        0        0        0     0.00     0.00       23    0        0        0     0.00        0        0     0.00
naa.6001405a2e547c17329487b865d1a66e     3164     4746        0     0.00     0.00     5692    0       54        0     0.00        0        0     0.00
naa.6001405a3a17fe4483c46f994f74b4e6        0        0        0     0.00     0.00        0    0        0        0     0.00        0        0     0.00
t10.ATA_____ST3400832AS_____________        0        0        0     0.00     0.00        0    0        0        0     0.00        0        0     0.00

The VAAI counters in esxtop are:

Counter Name Description
DEVICE Devices that support VAAI (LUNs on a supported storage system) are listed by their NAA ID. You can get the NAA ID for a datastore from the datastore properties in vCenter, the Storage Details—SAN view in Virtual Storage Console, or using the vmkfstools -P /vmfs/volumes/<datastore> command. LIO LUNs start with naa.6001405.

Note: Devices or datastores other than LUNs on an external storage system such as CD-ROM, internal disks (which may be physical disks or LUNs on internal RAID controllers), and NFS datastores are listed but have all zeroes for VAAI counters.

CLONE_RD Number of Full Copy reads from this LUN.
CLONE_WR Number of Full Copy writes to this LUN.
CLONE_F Number of failed Full Copy commands on this LUN.
MBC_RD/s Effective throughput of Full Copy command reads from this LUN in megabytes per second.
MBC_WR/s Effective throughput of Full Copy command writes to this LUN in megabytes per second.
ATS Number of successful lock commands on this LUN.
ATSF Number of failed lock commands on this LUN.
ZERO Number of successful Block Zeroing commands on this LUN.
ZERO_F Number of failed Block Zeroing commands on this LUN.
MBZERO/s Effective throughput of Block Zeroing commands on this LUN in megabytes per second.

Counters that count operations do not return to zero unless the server is rebooted. Throughput counters are zero when no commands of the corresponding primitive are in progress.

Clones between VMFS datastores and Storage VMotion operations that use VAAI increment clone read for one LUN and clone write for another LUN. In any case, the total for clone read and clone write columns should be equal.

See also

References

  1. VMware (2011-11-10). "Disable Space Reclamation". ESXi and vCenter Server 5 Documentation. Palo Alto: vmware.com. 

External links

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox
Google AdSense