VSAN 6.2 - Is it RAID or Erasure Coding?

Update: The day after this post was released, VMware issued a clarification post on the Virtual Blocks website. The post pretty much summarises what is written here, with a little more detail, although it doesn’t look like erasure coding in vSAN will be extended past the 3+1 and 4+2 schemes. You can find the post here or listed in the Further Reading section at the bottom of this post.

Update 2: I’ve also added in a reference provided by Rob Peglar, which describes an earlier paper by Peter Anvin on RAID-6 calculations.

In case you didn’t notice, VMware has just released version 6.2 (the 4th version) of Virtual SAN, also known as vSAN. One of the new features is data protection through erasure coding rather than mirroring of data. Rather confusingly, the regurgitated press releases and many of the of the official blogs seem to be describing the new data protection regime as both RAID-5/6 and erasure coding. Many of the product screen shots show this too. So what is VMware offering here, is it RAID or erasure coding?

RAID

For the sake of completeness, let’s qualify what the two technologies are. RAID-5 (and by extension RAID-6) are protection methods that distribute data and parity across multiple HDDs or SSDs. Historically we’ve seen both hardware and software implementations in storage systems.

Both data and parity are stored in stripes, that could be anything from 64KB upwards (per disk) in size. Data components are simply that; the actual data being stored. Parity (for RAID-5) is a calculation made using the XOR logical function that allows any of the failed data component to be recreated from the available data components and the parity. The parity value itself is calculated through XOR. RAID-5 requires a minimum of 3 drives (two data and one parity), although typical configurations use 3+1 (3 data, one parity) or 7+1 (7 data, one parity).

RAID-6 calculations are slightly more complex than simple XOR instructions. This is because with XOR, there’s no way to determine which disk has failed when multiple failures occur (the XOR function is commutative, in maths terms). There are various solutions to this, including using Reed-Solomon encoding, or in the case of NetApp by implementing Diagonal Parity (hence RAID-DP).

With RAID, read performance is good; you simply read the data directly. There is no I/O penalty in reading data from disk. However when writing to disk, parity information has to be updated; RAID-5 requires reading the existing data (if the data to be updated is smaller than the stripe size), reading parity, re-writing the new data and re-writing parity. So, 4 physical I/Os are required for each logical write I/O. RAID-6 has an overhead of 6 physical I/Os for each logical write I/O, however the overhead will be implementation dependent; for example RAID-DP has a lower overhead because data is always written to a new location (using WAFL) rather than needing to be prefetched.

So in summary, RAID-5/6 results in a capacity overhead based on the amount of parity data (3+1 protection = 33% overhead, 7+1 protection = 14.2%). Read I/O sees no I/O overhead and uses all available drives. Write I/O sees significant I/O overhead, depending on the implementation.

Erasure Coding

Erasure Coding is also a process of creating redundant or parity data from the original source information, in order to facilitate the restore of any missing components. However the process differs slightly as the original data is transformed using a mathematical algorithm which takes the original data and produces a set of new data that is greater than the original and so has redundant copies built in. This process is typically expressed as dividing data to be encoded into k components, from which n pieces are generated (n>k), with the property that any k pieces can be used to reconstitute the original data.

Erasure coding is more computationally expensive than simple RAID and therefore has a potential impact on system performance. Depending on the coding scheme and the specific pieces of data read, both read and write I/O incurs a compute penalty compared to traditional RAID-5/6. However, there are some special cases. When n-k=1 (i.e. having a single parity disk like RAID-5), the transformation process can be achieved using simple XOR instructions available in today’s Intel processors. The same applies for n-k=2 or two parity disks (aka RAID-6 by VMware). In this instance, additional calculations are required that can also be catered for with Intel instruction set extensions like SSE and AVX. For more background information on this, check out James Plank’s paper listed at the end of this post in Further Reading.

vSAN Implementation

So where does that leave us? Well, my assumption is that VMware are implementing the two special simplified cases of erasure coding as just discussed. This allows them to overcome most of the performance penalties that would be associated with extended erasure coding. However, calling them RAID-5 & RAID-6 may represent both confusion for traditional storage administrators or a clarification for those not 100% familiar with array-based data protection. Remember that these implementations are also network RAID – the data is distributed across nodes for protection and that they are only available with all-flash configurations – presumably for performance.

The Architect’s View

Note also that erasure coding as a protection mechanism is only available in this release of vSAN for FTT=1 & FTT=2 (the n-k =1,2 cases) – above that you’re back to mirroring (see Chad’s blog for confirmation). This begs the question about how RAID-5/6 will be implemented with uneven node counts (e.g. 5, 7 and upwards). Will the data be cycled round or evenly distributed (partially) across those nodes? Will n-k>2 be supported in the future? If so, what impacts on performance will there be? The benefit of erasure coding is the ability to create protection against many failure scenarios and allow protection with many varied configurations (e.g. both disk and node). However if there’s a performance issue going above two parity drives and this won’t ever be supported, is there any point calling it erasure coding? Once the dust settles on the announcements, it will be interesting to get into the detail of exactly how RAID-5/6 (erasure coding) has been implemented by VMware and to see if the restrictions I’ve highlighted are, in fact, true. Also of course to understand exactly how the protection scheme will be extended in the future.