XenServer 7.0 performance improvements part 3: Parallelised plug and unplug VBD operations in xenopsd
// Latest blog entries
The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the third in a series of articles that will describe the principal improvements. For the first two, see here:
The topic of this post is control plane performance. XenServer 7.0 achieves significant performance improvements through the support for parallel VBD operations in xenopsd. With the improvements, xenopsd is able to plug and unplug many VBDs (virtual block devices) at the same time, substantially improving the duration of VM lifecycle operations (start, migrate, shutdown) for VMs with many VBDs, and making it practical to operate VMs with up to 255 VBDs.
Background of the VM lifecycle operations
In XenServer, xenopsd is the dom0 component responsible for VM lifecycle operations:
- during a VM start, xenopsd creates the VM container and then plugs the VBDs before starting the VCPUs;
- during a VM shutdown, xenopsd stops the VCPUs and then unplugs the VBDs before destroying the VM container;
- during a VM migrate, xenopsd creates a new VM container, unplugs the VBDs of the old VM container, and plugs the VBDs for the new VM before starting its VCPUs; while the VBDs are being unplugged and plugged on the other VM container, the user experiences a VM downtime when the VM is unresponsive because both old and new VM containers are paused.
Measurements have shown that a large part, usually most of the duration of these VM lifecycle operations is due to plugging and unplugging the VBDs, especially on slow or contended storage backends.
Why does xenopsd take some time to plug and unplug the VBDs?
The completion of a xenopsd VBD plug operation involves the execution of two storage layer operations, VDI attach and VDI activate (where VDI stands for virtual disk image). These VDI operations include control plane manipulation of daemons, block devices and disk metadata in dom0, which will take different amounts of time to execute depending on the type of the underlying Storage Repository (SRs, such as LVM, NFS or iSCSI) used to hold the VDIs, and the current load on the storage backend disks and their types (SSDs or HDs). Similarly, the completion of a xenopsd VBD unplug operation involves the execution of two storage layer operations, VDI deactivate and VDI detach, with the corresponding overhead of manipulating the control plane of the storage layer.
If the underlying physical disks are under high load, there may be contention preventing progress of the storage layer operations, and therefore xenopsd may need to wait many seconds before the requests to plug and unplug the VBDs can be served.
Originally, xenopsd would execute these VBD operations sequentially, and the total time to finish all of them for a single VM would depend of the number of VBDs in the VM. Essentially, it would be a sum of the time to operate each of othe VBDs of this VM, which would result in several minutes of wait for a lifecycle operation of a VM that had, for instance, 255 VBDs.
What are the advantages of parallel VBD operations?
Plugging and unplugging the VBDs in parallel in xenopsd:
- provides a total duration for the VM lifecycle operations that is independent of the number of VBDs in the VM. This duration will typically be the duration of the longest individual VBD operation amongst the parallel VBD operations for that VM;
- provides a significant instantaneous improvement for the user, across all the VBD operations involving more than 1 VBD per VM. The more devices involved, the larger the noticeable improvement, up to the saturation of the underlying storage layer;
- this single improvement is immediately applicable across all of VM start, VM shutdown and VM migrate lifecycle operations.
Are there any disadvantages or limitations?
Plugging and unplugging VBDs uses dom0 memory. The main disadvantage of doing these in parallel is that dom0 needs more memory to handle all the parallel operations. To prevent situations where a large number of such operations would cause dom0 to run out of memory, we have added two limits:
- the maximum number of global parallel operations that xenopsd can request is the same as the number of xenopsd worker-pool threads as defined by worker-pool-size in /etc/xenopsd.conf. This prevents regression in the maximum dom0 memory usage compared to when sequential VBD operations per VM was used in xenopsd. An increase in this value will increase the number of parallel VBD operations, at the expense of having to increase the dom0 memory for about 15MB for each extra parallel VBD operation.
- the maximum number of per-VM parallel operations that xenopsd can request is currently fixed to 10, which covers a wide range of VMs and still provides a 10x improvement in lifecycle operation times for those VMs that have more than 10 VBDs.
Where do I find the changes?
The changes that implemented this feature are available in github at https://github.com/xapi-project/xenopsd/pull/250
What sort of theoretical improvements should I expect in XenServer 7.0?
The exact numbers depend on the SR type, storage backend load characteristics and the limits specified in the previous section. Given the limits in the previous section, the results for the duration of VDB plugs for a single VM will follow the pattern in the following table:
Number n of VBDs/VM
Improvement of VBD operations
|<=10 VBDs/VM||n times faster|
|> 10 VBDs/VM|| |
10 times faster
The table above assumes that the maximum number of global parallel operations discussed in the section above is not reached. If you want to guarantee the improvement in the table above for x>1 simultaneous VM lifecycle operations, at the expense of using more dom0 memory in the worst case, you will probably want to set worker-pool-size = (n * x) in /etc/xenopsd.conf, where n is a number reflecting the average number of VBDs/VM amongst all VMs up to a maximum of n=10.
What sort of practical improvements should I expect in XenServer 7.0?
The VBD plug and unplug operations are only part of the overall operations necessary to execute a VM lifecycle operation. The remaining parts, such as creation of the VM container and VIF plugs, will disperse the VBD improvements of the previous section, though they are still significant. Some examples of improvements, using a EXT SR on a local SSD storage backend:
VM lifecycle operation
mImprovement with 8 VBDs/VM
Toolstack time to start a single VM
|Toolstack time to bootstorm 125 VMs|| |
The approximately 2s improvement in single VM start time was caused by plugging the 8 VBDs in parallel. As we see in the second row of the table, this can be a significant advantage in a bootstorm.
In XenServer 7.0, not only does xenopsd execute VBD operations in parallel, but it also has improvements in the storage layer operation times on VDIs, so you may observe that in your XenServer 7.0 environment further VM lifecycle time improvements beyond the expected ones from parallel VBD operations are noticeable, compared to XenServer 6.5SP1.
Shared via my feedly newsfeed
Sent from my iPhone