I had a small discussion today with a few folks on Twitter that was a forerunner to a Wikibon “Peer Incite” discussion on whether RAID was still relevant or not (hence the title of this post). It seems to me that perhaps the Wikibon discussion was a good way to introduce a “RAID beating” technology to the market (and that’s a subject for another post), but regardless it may be worth reviewing where we are with RAID and it’s relevance in the future.
The concept of RAID for disk devices was first discussed in a 1987 technical paper called “A Case for Redundant Arrays of Inexpensive Disks (RAID)” by David A Patterson, Garth Gibson and Randy H Katz from the University of California at Berkeley. The premise of their paper was to compare the price and performance of SLEDs (Single Large Expensive Disks) versus cheaper alternatives in a redundant form and get improvements in performance, scalability and reliability for a lower cost. The paper compares the price of inexpensive disks to the IBM 3380, one of the first disks I ever used. These were monster devices the size of fridges, as you can see from this image I’ve borrowed from IBM’s archives (full link here).
I spend many hours recovering individual files from failed devices, having to recover the lost files from backups. I’m not saying these devices were unreliable, but unfortunately individual file restores isn’t a scalable solution to managing continually increasing capacities. To give you an idea of scale, I managed a mere 300GB of IBM 3380 & 3390 storage in one installation; what seems an incredibly small amount.
So, RAID promised to increase reliability significantly by providing a means to recover lost data. RAID-1 is probably the simplest and easiest of the RAID levels to understand; write the data to two separate drives and if one fails, you can read/write from the other. RAID-5 and RAID-6 offer distributed parity data recovery, with single and dual parity respectively. This reduces the RAID overhead (RAID-1 was 100% more expensive than no RAID on the disk costs alone) while retaining the ability to recover. However RAID is not infallible. Double disk failures (where a second or third drive fails during recovery of the first failed drive) can occur, despite how small the chances seem. This means RAID should be part of a portfolio of protection measures and not your only one.
Today we see RAID in even the smallest home storage devices, embedded as software options in operating systems and of course an mandatory component of enterprise storage arrays. However, RAID suffering from potential scalability issues.
The Scalability Problem
Whilst RAID was good for the hard drives of 20 years ago, we now are starting to see issues with the RAID architecture in a number of areas. These issues are being driven by the sheer increase in capacity we’ve seen for modern disk drives; an increase that hasn’t been matched by an equivalent improvement in performance and I/O throughput. Here are some of the challenges:
- Capacity improvements have increased 100-fold between the IBM 3380 and the latest range of Savvio 15K drives (1.26GB to 146GB)
- Performance has only increased 50-fold in the same period
- Density (volume occupied by drives) has increased 2 million-fold in the same period
- Price has dropped to 1/27,000th of the cost
With storage arrays that can hold up to 2000 drives, we can see that I/O isn’t keeping up with capacity even for the fastest hard drives on the market today. The problem is exacerbated when we look at high capacity SATA drives, currently pushing 3TB and set to go higher very quickly. These drives have slower spin speeds and less throughput, meaning RAID rebuilds have to be counted in hours (and potentially days) rather than minutes. This extended rebuild time has a number of implications:
- There is a greater risk of data loss, as rebuilds are taking substantially longer
- There is a greater performance impact as rebuilds impact host I/O capacity
There is one other problem that also can’t be ignored and that’s the chance of a non-recoverable bit read error – that is, the chance that your data cannot be re-read from disk. This error is recoverable with RAID, but not if RAID recovery is taking place, as this failed read may be essential to rebuild missing data. As an example, the latest Seagate Constellation drives have a non-recoverable bit read error rate of 1 bit in 10E15 (in fact the failure is reading a whole sector). A 1TB drive has 1E13 bits. If we have to recover a disk by reading the whole RAID stripe (imagine it consists of 10 disks) then we have a 1 in 10 chance that we will be unable to recover some of that data. This risk doubles to 1 in 5 with 2TB drives and so on. As we reach 8-10TB drives, rebuilding the entire RAID group in this instance means we are almost always likely to fail to recover some data. These values will change with different RAID group sizes and disk capacities but we’re pushing the technology close to the edge at this point.
We’re already seeing some workarounds to the RAID scalability problems.
- Block-based RAID – these architectures change the way in which RAID is implemented to a block rather than whole disk level. So, if a drive has a partial failure, only the failed areas of the disk are rebuilt. In addition, if the RAID group isn’t full, the system doesn’t spend time rebuilding white space.
- Failure Prediction – Storage arrays like Hitachi’s USP & VSP use SMART to predict when a drive looks likely to fail. That drive is then “soft failed” and the data copied off it while it is still working. This means data can simply be moved to another drive, rather than rebuilt from the other disks in the parity group. This has a significant impact on improving recovery time and reducing the impact on performance, but can be more expensive in drive costs and maintenance.
- Data Distribution – The IBM XIV storage array distributes data across many drives. This has the impact of dramatically improving the time taken to recover failed disks as all drives participate in recovery (and protection is only RAID-1). Of course the tradeoffs are both the increase in cost and the still possible risk of a double disk failure, which will impact every single LUN on the system.
Whilst the above solutions are good, I believe we need to see more from the drive manufacturers themselves. Ultimately the reason RAID has become a problem is due to reliability and RAID rebuild times. We need a new approach to the way host I/O and rebuild I/O is prioritised and managed by a drive. For instance, with the ability to put large amounts of flash into a drive, flash could be used as a repository for RAID reads. As a drive executes normal read/write operations, it caches any data needed for requested RAID rebuilds into flash. This is then made available to the RAID controller via a separate channel to perform the RAID rebuild. Effectively data is rebuilt on a failed drive only as that data is read/written from the original drive; any unaccessed data is rebuilt as a low priority task. If this approach was coupled with the ability to highlight a failing sector (and so recover that first) then reliability improves. This idea is only one thought; I expect extending RAID’s useful life will follow a similar path to that of the increase in drive capacities; lots of incremental improvements that over time move things forward.
RAID isn’t dead, merely evolving to meet new challenges. In 20 years’ time I suspect RAID will still exist but will be barely recognisable from the original Berkeley paper.