Chuck Hollis’ recent post reminded me that I must commit my ideas to paper (or to WordPress for that matter) on what constitutes Software Defined Storage (SDS). Without a doubt, “Software Defined” will be overused as this year’s favourite storage marketing phrase, as we see less and less use of the terms “cloud” and “big data”.
It’s easy to use the examples of server virtualisation and networking as reasons why thinking of storage virtualisation for SDS is only just around the corner. However those assumptions are incorrectly made and as usual, storage is a special use case, that doesn’t quite fit the mould of compute and network.
Whether we like it or not, storage is different. In a virtual server environment, the image of the server is held in memory, using a data image on disk as the means of maintaining state. Only changes are committed to disk and these can be asynchronous in nature in order to improve performance. If the physical server is rebooted and the in-memory copy is lost, it is simply reconstituted from the disk image and off we go again. Moving virtual server around the physical infrastructure is simply managing data in flight.
In networking, data is transient across the network and doesn’t reside in the switch other than temporarily, as it moves between compute and compute, or compute and permanent storage. The data is ethereal and the network was designed to be just that; capable of losing data during transmission, with high level protocols designed to manage that scenario.
Storage arrays (and storage in general) have to provide a different purpose. It is the permanent record of data. It has to be the part of the computing infrastructure that maintains state, even when the power is off. That as we know presents special challenges.
For networking, “software defined” means splitting of the command plane and the data plane. Simply put, the silicon in the network switch responds to management from an external device, directing packets as it is instructed to do so. To carry this analogy into the storage world, we have to look at two pieces; the transmission of data and the storage of data. For transmission, we expect to use Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), iSCSI (IP SCSI) or perhaps something more bespoke like Infiniband or more proprietary like FICON. The problem is, technologies such as Fibre Channel weren’t defined to expect a dynamic network. A source and especially target device were expected to be static devices that joined and stayed in a network (or at worst, very rarely left). A Fibre Channel network change was a disruptive event, RSCNs initially notified all devices in the fabric, which if occurring too often could cause fabric issues (over time vendors have minimised the effects of RSCNs). For the networking component, SDS could mean a more flexible routing of data across the SAN.
However, the second thing we need to consider is the requirement to store data permanently, a concept that doesn’t exist in Ethernet networking. It isn’t that simple to decide a that data volume or LUN now resides at the end of another connection and for traffic to be routed there. What happens to the existing data? How would data be moved and how would data integrity be maintained? Most important, what happens in a disaster scenario? This is the hardest part of trying to work out what “software defined” means in a storage context. Some vendors have used the idea of storage virtualisation or running as a virtual machine to represent this part of SDS.
So does Software Defined Storage exist today? In limited ways, I think it does. One example is Hitachi’s Universal Volume Manager feature within the VSP platform, also known as external storage virtualisation. This enables data to be written to an abstract device (which could be an internal disk or an external array) and for the control and data to be treated separately. The array receives and writes data to the target device, but can be directed to write data to another device through the separate control plane. This can even include (with Hitachi Availability Manager) redirecting I/O to a secondary device without requiring host interaction, but spoofing WWN addresses. It can also mean redirection within the array using Tiered Storage Manager. Incidentally, the image presented here shows a diagram I produced for Hitachi over three years ago. It shows how the technology (which has changed names in some instances) can be placed into layers, in a similar fashion to the one Chuck uses in his presentation. Good ideas never go out of fashion, don’t you think?
VPLEX is another platform I think meets what SDS could mean. Data can be stored across multiple nodes, rather than statically in one place, with the ability to direct that control separately from the data path.
There are also other vendors offering products that fit some aspects of SDS. Solidfire for instance, create an array of nodes that are managed using REST APIs. The data and the control are separated from each other, with provisioning and management handled separately via API. Other platforms like Nutanix do this too, although they are adding compute into the mix.
The Architect’s View
Software Defined Storage is a difficult term to pin down. Today’s storage protocols, along with the need to ensure persistent storage of data whilst maintaining data integrity mean that the dynamic nature of SDS is hard to achieve. There’s still a way to go before storage can be considered completely abstract enough to be termed “software defined”.
Comments are always welcome; please indicate if you work for a vendor as it’s only fair. If you have any related links of interest, please feel free to add them as a comment for consideration.
Subscribe to the newsletter! – simply follow this link and enter your basic details (email addresses not shared with any other site).