Backblaze, a data protection and cloud storage company has announced they are storing more than 1 exabyte of customer data. That’s an achievement in itself, but with 125,000 hard drives under management, does this now justify some active data optimisation?
An exabyte certainly sounds like a lot of data. There’s a handy visualisation blog post on the Backblaze website that helps put the number into context, but as a simple comparison, one exabyte = 1000 petabytes. = 1,000,000 terabytes. That’s about 50,000 of today’s highest capacity hard drives. As a guide, Western Digital shipped 383 exabytes in 2019.
Many large enterprises and other companies that store vast quantities of data are already familiar with exabyte-size volumes of information. Generally, this can be simply the amount of data in primary storage, not including any backup.
However, for any organisation, managing growth to the exabyte level introduces challenges in optimisation and making sure resources aren’t being wasted.
The Backblaze model assumes customers may want to restore any historical backup data at any time. In the early days of the platform, this made sense. Much of the data being stored would be only weeks or months old. Over time, the chances of a restore from 5-10 years ago will diminish rapidly. I’d go as far as to suggest that most consumer customers can’t even remember what data they had over five years ago.
Does it make sense to continually store archive data on disk? You could look at this question in two ways.
Firstly, as HDD capacities continue to increase, then older drives will be phased out and replaced with newer, higher capacity models. Western Digital are introducing 20TB HDDs, with the future promise of 100TB capacities not far into the future.
- HDD Capacity Threshold Reaches 20TB
- Dude, Where’s my 100TB Hard Drive?
- Dude, Here’s Your 100TB Flash Drive!
However, looking at the growth rates quoted in this Backblaze blog post, new customer data is exceeding the speed at which vendors are releasing higher capacity drives. Backblaze will have to continue to expand their footprint to address demand.
At what point does a medium like tape become financially viable? The idea of having any customer backup available online for an instant restore is appealing. However, power, cooling and space costs dictate that scaling on disk can’t go on forever. Then there’s the question of the environment. Can IT organisations (and social media platforms) continue to justify keeping data online forever, just in case that 0.0001% of the customer base requires a restore?
At 30TB compressed capacity, LTO-8 would offer a better TCO than disk. Tapes don’t need constant power and will last 30+ years when treated correctly.
The Architect’s View
I’m sure Backblaze has statistics on data access profiles and can identify exactly which data continues to remain unused year after year. Now that such an impressive milestone has been reached, would this not be the time to demonstrate that a trade-off between instant restore and efficiency can be made?
Copyright (c) 2007-2020 Brookend Limited. No reproduction without permission in part or whole. Post #1e0f.