Today I managed to write about 90% of a blog post talking about value and traditional storage arrays – train journeys are pretty good for that sort of thing (and for taking your mind off the other passengers). However having read some of the comments regarding Backblaze’s latest blog post on their drive reliability, I decided a small rewrite was in order, while the main substance of my article stays the same.
In a few recent posts I’ve talked about ensuring we don’t throw away the good knowledge we have developed in centralised storage arrays over the last 20+ years. SAN solutions have evolved to deliver high resiliency and advanced features including CRC checking (e.g. NetApp SATA disks), read on write checking (e.g. Hitachi HUS), pre-emptive disk sparing (most enterprise arrays), scalability, dynamic tiering, QoS, multi-tenancy and replication to name but a few. Array vendors are also doing clever things with their management of physical media; the all-flash vendors are at pains to stress the benefits of their relationship with their NAND manufacturers and how this provides them greater reliability and consistency in latency.
The same also applies to spinning media. X-IO, an array vendor that was spun out of Seagate, uses proprietary information for Seagate drives to prolong their lifetime by managing disk read and write failures at the sector level. This means they have devices in service in production that have worked consistently for over 6 years – these are arrays sold as “black boxes” and don’t have swappable disks. Another company that provided me a briefing last week are American Megatrends, who I admit I’d only heard of as a BIOS software company. They also make storage arrays under the StorTrends brand and are doing clever things with flash using it dynamically as a cache and/or tier of storage, which allows them to deliver high levels of throughput and low latency (more about that in another post) and prolong flash lifetime.
Now, let’s look at some of the comments in Backblaze’s recent blog:
These drives are designed to be energy-efficient, and spin down aggressively when not in use. In the Backblaze environment, they spin down frequently, and then spin right back up. We think that this causes a lot of wear on the drive.
My question here is; why let the drives spin down? Can’t they programmatically be kept spinning, or is the Backblaze software not capable of this?
When one drive goes bad, it takes a lot of work to get the RAID back on-line if the whole RAID is made up of unreliable drives. It’s just not worth the trouble.
Of course the obvious point here is why Backblaze are waiting until a drive fails before doing a rebuild. Recovering from a RAID failure where data has to be rebuilt is time consuming and error prone, especially if other drives also exhibit problems. There’s also some logic here in mixing drives from vendors so less reliable drives don’t take down and entire RAID group.
Another issue when running a big data center is how much personal attention each drive needs. When a drive has a problem, but doesn’t fail completely, it still creates work. Sometimes automated recovery can fix this, but sometimes a RAID array needs that personal touch to get it running again.
Again, predictive sparing that allows a drive to be taken offline and removed for reformatting/recovery would help here, as would the ability to logically disable failing sectors of a drive. Enterprise arrays don’t need the “personal touch”, but of course you get what you pay for.
What the experience of Backblaze demonstrates is how much work has gone into delivering consistent, reliable I/O responses from persistent media by the existing array vendors. Backblaze are rediscovering all the issues and having to re-invent the wheel in order to fix their issues of reliability when dealing with very large volumes of disk devices.
The Architect’s View®
As storage evolves, we can’t simply ignore the issues involved in dealing with permanent storage media. The IP developed by the array vendors provides added value, which needs to be built into hyper-converged and distributed storage solutions. Webscale may work for applications, but failing an entire “storage server” to fix one faulty disk isn’t a scalable process. As we see more distributed storage solutions, one differentiating factor between the good and excellent implementations will be the ability to seamlessly deal with disk and flash hardware failures.
One final thought; at some stage Backblaze may realise that distributed storage based on erasure coding may be much more flexible for them than massive numbers of RAID sets. However I’m not sure that any Open Source distributed storage solutions using erasure codes actually exist.
- Re-inventing the Storage Array and Learning From Backblaze
- What Hard Drive Should I Buy (Backblaze Blog, 21 January 2014)
Copyright (c) 2007-2022 – Post #4987 – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission. Photo credit iStock.