Fixing Storage Performance with Software

This week I’ve had two interesting briefings that talked about significant storage performance improvements which have been achieved through storage updates. The first was from DataCore, a long time storage virtualisation company that has been in the industry for 20 years. The second was from Nutanix, which released a series of updates as part of AOS 4.6 that results in a “4x performance improvement” for existing systems. My first thought when I see these kinds of claims is to be somewhat skeptical. Firstly, if such as change in performance was possible, then how badly was the code written in the first place? Second, where is the independent testing evidence to back up these claims?

DataCore

Let’s start with DataCore first. The new performance claims from the company relate to a feature update to existing products known as DataCore Parallel I/O. Through the efficient use of multi-threading, the DataCore software (SANsymphony-V 10.0 in this instance) was able to achieve 459,000 IOPS at 0.32ms latency with only a 2U server. The results can be found on the Storage Performance Council website and looking through the specification there’s nothing particularly unique about the hardware used – Samsung EVO 850 500GB SSDs and Western Digital 300GB SAS HDDs. The claimed improvements come from going back to efficient multi-threading and using today’s multi-core CPUs more effectively.

In today’s highly virtualised world it certainly makes sense to parallelise as much as possible. The “I/O Blender” effect from virtualisation means there are many active threads in a virtual server environment, all of which are demanding storage services. This leads to a number of questions:

How efficient is storage I/O management in the hypervisor? On a system with hundreds of virtual machines, the I/O will be tunnelled through a specific set of processes, many of which will map to the underlying storage (e.g. LUNs or datastores). The efficiency of the hypervisor to manage workloads, serialise I/O between them all and handle the locking will be as important as the ability of the storage to handle requests.
How should storage be laid out to optimise the I/O traffic to the virtualisation layer (in this case DataCore)? Did this configuration need a specific layout of multiple LUNs/volumes to achieve the high throughput results?
What impact does the Windows operating system have here? The SPC-1 testing was done on Windows 2012 R2; how will the results be affected when using Windows 2016? Can this be implemented on NanoServer versions to cut out even more unnecessary code?

The initial results from SPC are certainly impressive, however I’d like to see more examples of varying workloads, especially virtual server/desktop environments and container implementations. This should help end users to understand exactly how they should be amending their configurations to take advantage of these results.

Nutanix

On 16th February 2016, Nutanix announced version 4.6 of AOS, their platform operating system. There were a number of other feature announcements (stuff to cover another day), however from a performance perspective, the company is claiming a 4x improvement (not clear if this throughput/IOPS or latency related) within their distributed storage layer. Earlier this week I had a briefing with Prabu Rambadran, Senior Product Marketing Manager at Nutanix and expressed my skepticism with the ability to deliver these kind of improvements. Prabu pointed out some good reasons why this level of improvement is achievable, including (most obviously) being on the latest hardware, changes in efficiency within the compilers used to write the software and so on.

Now, for the sake of balance, it’s worth pointing out that Nutanix haven’t released any specifics on the improvements; what hypervisor this was with, what kinds of workloads and so on. This collateral is being worked on, so it will be good to see how independent this is, when it’s released.

The Architect’s View

There’s no doubt that there are still significant performance improvements to be achieved from revising and rewriting storage code. The changing workload landscape means what worked in the past, isn’t going to be as efficient in the future. EMC knows this only too well, when the issues with legacy VNX (based on CLARiiON) were addressed with the release of VNX2. Unfortunately many customers were less than pleased with the inability to do in-place upgrades between the two platforms, both in terms of code and the re-use of disk shelves. This highlights an important point; as storage systems evolve further, there needs to be a continued separation of software function from the hardware. Storage still needs hardware (obviously), but software improvements and changes shouldn’t leave customers in a position of having to manage time consuming upgrades. As many of the software defined storage players move into hyper-convergence and release appliances, the efficiency of their software will be a key factor in determining success or failure.

DataCore

Nutanix

The Architect’s View

Further Reading