I don’t normally write blog posts that follow up on other people’s discussions however having seen some of the tweets relating to Nigel’s blog post on VSA versus VSAN, I realised my comments on the subject would stretch to an entire post of their own. So here it is.
The discussion (although I haven’t listened to the podcast) revolves around VSA (a virtual storage appliance) that can run in a VM versus technology such as VMware’s Virtual SAN (VSAN), which runs in vSphere ESXi but is more tightly integrated with the hypervisor. As Nigel points out, a VSA is more flexible in terms of being (potentially) portable between hypervisors, whereas VSAN creates a form of lock-in for the user. Whilst that is true, I think however there is a wider discussion here that isn’t being explored and as usual it takes the customer back to one of risk versus cost.
A virtual storage appliance or VSA is just that, it’s the representation of a physical storage array, but running in a virtual machine. The hypervisor provides the storage resources to the VSA as a VM, although the VSA may have more direct control of the storage hardware either as passthrough devices or using features such as DirectPath I/O. Early VSAs were almost a direct copy of the equivalent storage array (think of implementations like NexentaStor or HP’s StoreVirtual Appliance), however with the move to hyper-converged solutions, the VSA has been designed specifically to manage distributed storage workloads, as seen in implementations used within the architectures of Nutanix, SimpliVity and solutions from the likes of Atlantis Computing.
From a management perspective, the original VSAs may be likened to managing a traditional array, but not so with the hyper-converged platforms where the integration work done by these vendors hides the complexity from the customer. Inevitably there will be a discussion on the merits of running storage for the platform within a VM, however let’s park that for now.
Virtual SAN takes a different approach and integrates the storage I/O functions into the hypervisor itself, becoming part of the “operating system” of the hypervisor. I use the term O/S loosely here simply to convey the image of how the code that manages storage is located differently than within a virtual machine. From a customer perspective, deployment and management is slightly easier; VSAN is a feature of ESXi that is enabled and configured through vCenter, subject to the appropriate resources being available. I say “slightly” because one of the areas the likes of Nutanix have been working hard in is simplifying the deployment and management of their solution.
Shared vs Dedicated Hardware
Both VSA and VSAN have one thing in common compared to using a separate dedicated hardware array and that’s the shared nature of the hardware resources. Storage and virtual machines compete for processor cycles and memory in VSA/VSAN whereas with a dedicated array, all of the storage functions are (obviously) managed by dedicated hardware. In fact VMware went to great lengths to offload specific functions through VAAI, to ensure that storage tasks weren’t impacting the hypervisor and virtual machines.
So the question is, what are the real pro’s and con’s of both types of solutions? I think as always the issues come down to risk, operational efficiency and cost. Let me use an example here.
With either a VSAN or VSA solution, storage and compute are provided by a single set of hardware. Imagine a 4-node cluster running VSAN. If one node fails, the others have to take over the workload. Where previously that takeover needed to be sized for compute (to ensure the remaining three nodes could handle the workload), now we have to additionally size for storage capacity and performance. If our cluster had (say) 100TB of active data, the 3-node cluster has to support the 100TB of data, requiring additional capacity to be added to every node to cater for this. We have to add spare capacity to every node as we have no way to predict which node may fail. Obviously there’s the option to run unprotected (and not regenerate the failed data) but that’s not a desirable scenario for production. As the node count reduces from 4 to 3, additional load is placed on the cluster to re-create the missing data. So the original design may need to be over-specified to ensure that hardware failure doesn’t impact performance. In both VSA & VSAN solutions there is the option to create more mirrors/replicas, ensuring that a node failure doesn’t require the immediate copying of data. However this results in additional cost.
One simple VSAN example shows how a combined solution could expose more risk than a dedicated hardware solution. Of course storage arrays fail too, but we’ve had 20+ years of innovation and development in dedicated storage hardware to mitigate failure wherever possible and to ensure consistent I/O performance. This means consistency not only in the delivery of I/O in normal operation, but delivery of I/O in failure mode; the example of VSAN failure posted on Reddit shows how things can go badly wrong.
The VSA model could be argued to have the same issues as VSAN in the case of a node failure as this also results in fewer nodes to manage the data, but it depends on how the failure is managed. Obviously with both solutions, the more nodes you have, the impact of a single node failure is smaller and fewer spare resources are needed.
The Architect’s View®
Getting back to Nigel’s original blog post, I don’t believe that one technology can be called simply right or wrong. Integrating storage into the hypervisor (in whatever form) has both benefits and risks. Customers simply have to weigh these up against each other. The main problem here is that the risks and costs are not always clearly understood. Bear in mind that the vendor will only provide you the list of benefits; it’s up to the bloggers and consultants out there to help you understand the risks.
- VSAN is Not Better Than a HW Array (Nigel Poulton blog, 20 October 2014)
Copyright (c) 2009-2023 – Brookend Ltd, first published on http://www.architecting.it/blog, do not reproduce without permission. Post #25bd.