A couple of weeks ago I was briefed on the latest technology to come out of PernixData; a software suite known as Architect, which provides insight into the I/O workload of virtual machines in a VMware vSphere environment. The company’s previous (and still current) product is FVP, a caching software layer for the hypervisor that uses flash or system DRAM to cache both writes and reads to improve I/O performance when writing to external storage. FVP has also been released as a free offering known as Freedom, which is limited in capability (it can’t use flash drives and doesn’t support write-back caching).
Storage has been particularly problematic for virtual environments (both server and desktop) as the I/O profile is significantly more random in nature than traditional I/O from physical machines. Add to this the increase in processor, memory and bus speeds and it’s easy to see how servicing the I/O from closer to the processor (i.e. from within the server) makes sense.
There’s no doubt that the Pernix solution is elegant in design. It is capable of handling read, and both write-back and write-through caching. The issue of caching in volatile DRAM is addressed by replicating all write I/O to another vSphere host in a cluster configuration; each write is protected on another server. Of course in a failure scenario you need to take more care in the recovery process, but theoretically the data should be all there.
Architect is an attempt to move to the next level in optimising the I/O for virtual machines. Caching solutions are simply reactive to the I/O profile without providing any specific understanding of what is happening with the application. A badly written application can be significantly improved using caching, but of course the poorly written code could be rewritten to be more effective (euphemistically referred to as a “chance to optimise” during our presentation).
The problem with any attempt to optimise the I/O path is that most (if not all) of these solutions don’t have a clear view of all components. That means seeing the data from application, through host caching, through virtual NICs (iSCSI or FC), through a hypervisor, through physical NICs, into the storage network (Fibre Channel, iSCSI, FCoE, NFS, SMB) and into the storage array. At the array there are almost an infinite number of ways to lay the data out on disk. ICDAs (Integrated Caching Disk Arrays) use local DRAM, flash and disk to optimise the writes onto the physical media, with techniques that convert random data to sequential, pre-fetch, cache and otherwise do everything to return the I/O to the host as quickly as possible.
At this stage, Architect sees everything above the level of the physical array with only a “black box” view of what’s happening within the physical shared storage layer. Now, imagine how FVP affects this profile. Depending on the size of the cache in the servers, FVP will optimise up to 100% of the read I/O, flushing write I/O down to disk periodically as seen fit by the internal algorithms. Suddenly the array sees a very different workload and may react badly to such as skewed workload profile. On the other hand, it may not; it all depends on how the external array vendor implements their algorithms.
This suggests both a problem and an opportunity. First, FVP and Architect aren’t seeing the full picture and are making assumptions about the storage that could be incorrect. Not having a clear understanding of the array capabilities could cause customers to “tune” their I/O based on FVP recommendations and find this causes a problem with the external array. This is especially true if FVP-based workloads are not the only users of the shared storage.
Second, there’s the opportunity. A customer with enough information could choose to implement external storage with much lower levels of cache or flash, thus reducing costs. Alternatively, PernixData could work with the array vendors and optimise the write destage process, depending on the external array, so adding value to that product.
At this point we would start to move to a more end-to-end solution, something we could almost consider a platform. Until that point, I still see FVP as a feature (albeit with some additional smarts within Architect), a feature that perhaps some external array vendor may see as an attractive addition to their product portfolio, allowing them to implement a platform of their own.
Copyright (c) 2009-2022 – Brookend Ltd, first published on http://www.architecting.it/blog, do not reproduce without permission. Post #3e4d