The Evolution of Storage Virtualisation

Storage virtualisation is a technology that’s been around for many years and acts to abstract, in some form, the physical hardware from the LUN or volume the host/user sees. The aim is to use this abstraction to reduce management overhead (for example with transparent migrations), to provide an ingest path (e.g. moving data from a local to shared storage platform), to extend the life of external resources or to manage I/O performance.

The most well known implementations from a hardware perspective are probably IBM’s SVC (SAN Volume Controller) and Hitachi/HDS’s USP/VSP Universal Volume Manager, although pretty much all of the major enterprise platforms (VMAX, NetApp FAS, HP 3PAR) do some kind of virtualisation or other.

The key feature of the above mentioned platforms is that they sit inline with the data. These inline solutions have a number of benefits including:

“LBA” (logical block address) abstraction, the appliance/software manages the mapping of virtual to physical data.
Ability to cache data in the appliance and improve performance.
Ability to see all data coming through the system and apply QoS and other service controls.
Ability to apply high level services, such as snapshots, replication and de-duplication that may not exist on the base hardware.
Ability to relocate data dynamically with transparency to the host application.

As we can see, the value is in the abstraction layer and like all modern storage features,this is driven from software. At Virtualisation Field Day 6 in November 2015, FalconStor, a long-time player in the storage industry, presented on FreeStor, their all-encompassing storage virtualisation solution based on many of their existing product lines. FalconStor claims that the FreeStor product has been almost entirely rewritten and doesn’t depend on the legacy implementations of previous solutions. To that end, certain features (like active/active) were not present in the initial FreeStor releases and are only just being reintegrated.

As a swiss army knife for storage virtualisation, FreeStor certainly appears to meet all the requirements of legacy deployments. I choose the word “legacy” carefully as I think the main benefit of the product is in re-using existing hardware assets. The main drawback is in having the software/hardware in-line with every I/O. Today, storage has issues around latency and throughput which can only be made more difficult to manage if all I/O has to go through a central set of appliances that have to have the same I/O capability as the storage hardware they virtualise.

When starting with a blank sheet of paper and coming up with a design, there are perhaps other ways to implement the levels of abstraction FreeStor provides.

That leads us on to two other implementation models. The first is similar to “traditional” virtualisation, in that the virtual to physical location is abstracted from the accessing host. However in this implementation the appliance or software doesn’t sit in the data path. Instead the appliance/software are used to hold metadata that maps the relationship between storage clients (consumers of storage) and storage providers (servers providing storage). This model is one that Primary Data Inc have implemented with Datasphere. The team from Primary Data presented at both Storage Field Day 7 (which I attended) and Software Field Day 8. This post has links at the end to the videos, which are well worth watching.

So, what’s the benefit of this kind of implementation? Well, it removes the dependency from being inline, so it’s possible to scale better, simply by scaling the metadata engines. I/O isn’t passing through some large monolithic servers that have to have the same I/O capability as all of the underlying storage, as is the case with traditional virtualisation. The downside is the need to add agents to the clients in order to manage the LBA abstraction and know how to write to the physical storage when addressing the virtual LUN. Incidentally, this form of storage virtualisation is similar to the network implementations that existed around 10-15 years ago, but ultimately proved unsuccessful.

The third stage of implementation is to remove the need for the metadata servers altogether. We see this totally dispersed “shared nothing” solution from companies such as StorPool or EMC’s ScaleIO. In this model servers can be either storage providers or consumers (or potentially both), with data and metadata duplicated and spread across all of the participating nodes. Data can be automatically replicated between nodes to protect against failure scenarios and in the case of both the EMC and StorPool solutions, scale out in both capacity and performance linearly with each node added. This third solution isn’t aimed at re-using existing resources and is a greenfield solution to building a commodity-based scale out storage platform.

The Architect’s View™

So where does this position virtualisation today? Model 1, the traditional view still works well for legacy platforms, especially where there is a need or desire to reuse existing assets. Where 5-10 years ago this kind of solution could be part of a greenfield design, it is increasingly likely to be less so. Model 2, follows a similar path, removes the I/O bottleneck but keeps centralised metadata. This solution is again good for reusing legacy infrastructure, but I’d question whether it provides long term value in greenfield deployments. Then we have model 3, the totally shared nothing option. This seems the most ideal in terms of pure virtualisation design, but lacks one critical aspect – it throws away all of the accumulated knowledge in traditional storage that has been built up over the past 30 years. This knowledge manages I/O performance, handles recovery scenarios, allows background data rebuilds from failure and implements rich data services. Ultimately this transition from dedicated appliances to totally distributed software-based storage is the most difficult one for enterprise customers to make. They need confidence to believe that software-based solutions will be as reliable as their existing platforms. This is the challenge for the likes of StorPool and ScaleIO, in convincing the market that their products are as reliable as those they seek to replace.

The Architect’s View™

Related Reading