In a widely reported article, data recovery firm Secure Data Recovery highlighted the potential that complexity is shortening the life of hard disk drives. Does this make the HDD less sustainable than newer forms of media like SSDs and persistent memory?
The Secure Data Recovery blog, which can be found here, looked at a large sample of hard disk drives. Those that were inoperable for “non-predictable” reasons were excluded, including poor handling, electrical issues, and natural disasters. Just over 2000 drives remained in the sample, with power-on hours and pending (failed) sector counts tallied for each drive vendor.
The results indicate that drives manufactured before 2015 appeared to be more reliable than those manufactured after that date. A possible justification for this outcome is that modern drives using techniques such as SMR are less reliable. Now, we don’t know the type of drives in this survey. Theoretically, we’d expect enterprise drives to be more resilient than those in other devices. Additionally, enterprise systems should have been maintained in more stable environmental conditions. So, we need to take the results of this analysis with a pinch of salt.
The reason I found this article interesting is that back in 2016, I wrote about rate limits on HDDs. At that time, vendors had started introducing write capacity limits similar to those seen on SSDs. We calculated equivalent DWPD figures and postulated whether HDDs could be less reliable when pushed past the warrantied write capacities.
As far as we are aware, there are no storage systems or server capabilities that monitor the quantity of data written to hard drives over their lifetime. It’s conceivable that many of the drives in the Secure Data Recovery survey are drives that have seen heavy write activity. This could be another reason for increased failures compared to pre-2015 drives, where warrantied capacity didn’t apply (drives typically using the CMR recording technique).
What does this have to do with sustainability? We recently wrote about the ability to reach 300TB in a single SSD, an aspiration that Pure Storage is looking to achieve within the next three years (it’s a custom SSD, admittedly, but single “device” is probably a fairer description).
One aspect of this challenge is to make these drives reliable enough to last in the enterprise for many years. If a drive fails, it needs to be repairable. That means creating a design that’s modular and with components that are easy to replace.
To my knowledge, HDDs have never been repairable. A failed controller module could, conceivably, be replaced. In most modern HDDs, these components are external to the drive itself on the underside (see figure 1). However, anything internal to the drive is likely to see the HDD being scrapped and recycled.
Back in the 1990s and 2000s, when HDD-based arrays were the norm, most financial organisations wouldn’t return failed drives but chose to shred or otherwise destroy them (and absorb the cost). Much of that material would have ended up in landfill. Back in February 2019, we recorded a podcast with one company looking to recycle old drives through a secure destruction process.
Drive recycling does reduce or eliminate a lot of waste heading for landfill, but it doesn’t solve the issue of the energy used to manufacture new drives to replace those that have failed – or the energy required to extract the valuable metals from shredded units.
If HDDs aren’t cost-effective to repair, then perhaps we need to look after our HDDs with more care, as the complexity of drives increases. The evidence for this, however, is conflicting. Back in 2014, Backblaze produced a report that implied temperature had no impact on failure rate, echoing a previous survey from Google but conflicting with findings from Microsoft. These reports were from pre-2016, when we started to see more SMR drives introduced, so this data may no longer be relevant.
However, the most recent Backblaze report looking at drive failure rates seems to imply that drives should last past the three years indicated by Secure Data Recovery – 88% reaching the six-year mark in the Backblaze study. Backblaze believes that efficient design and drive husbandry are responsible for increasing longevity.
The Architect’s View®
The greatest challenge for HDD vendors looking to increase sustainability is the unit cost of a single SSD. At around $600-700 each, drives aren’t cost-effective to repair, especially those that are helium-filled. The security issue that results in drive shredding can be answered by SEDs (self-encrypting drives) and is a feature that’s been used in enterprise arrays for many years. However, if a drive fails, it’s probably headed for the scrap heap (or the recycling heap).
The alternative option is to try and increase drive lifetime. Rate limits introduced by vendors provide warrantied capabilities, but do any array or system vendors bother to take this data into account? For businesses like Backblaze, extending drive lifetime has a direct impact on costs and the success of the company.
We believe that end users should be challenging their array and disk drive vendors to supply more information on failure rates, as well as recycling and reuse rates for raw materials. Additionally, system vendors should be focused on ensuring drives are monitored and not pushed past warrantied limits.
The rate limiting of HDDs is another reason the technology is headed towards the archive market. We’re not going to see the “death of the HDD”, but as discussed in a recent podcast (and many articles before), the transition away from HDDs in the enterprise data centre is well underway for everything but long-term archive retention.
One final question – would the sustainability of HDDs be part of your TCO calculation when moving to an all-flash system? Although there’s no directly attributable cost, the extended lifetime of SSDs, along with much lower power draw, may make all-flash much more attractive than HDDs for systems that are retained for five or six years. Might we see vendors introducing much longer lease terms as a result? We will be watching to find out.
Copyright (c) 2007-2023 – Post #7898 – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.