Sometimes it’s hard to believe that there’s anything more to achieve in data storage. Surely things are pretty much complete; we have the ability to store vast quantities of data reliably and each year the per unit cost of storage continues to decline. So as 2015 rolls in, is there anything left to achieve? What are the challenges of the next 12 months and beyond?
From my perspective I see two main themes that have developed within storage – (a) the need to reduce the latency of writing from compute to some form of permanent storage medium and (b) the need to store vast quantities of data reliably and for a long time.
We’ve heard lots of talk about “moving compute closer to storage”. That’s what hyper-converged solutions give us (a big thing for 2014) and it’s also what NVDIMMs are promising for this year and beyond. Depending on exactly how you need to process data, technologies like Hadoop can move the compute to the storage; HP’s The Machine (and other rack-scale solutions) will disaggregate the compute/storage/networking functionality and deliver performance at higher scale. All-flash arrays have made sub-millisecond latency for tier 1 applications an affordable reality and in the next 24 months, expect to see lower flash costs eat into fixing the performance problem for tier 2 data too.
At a micro-level, vendors such as Samsung continue to make incremental improvements with technology like 3D NAND, overcoming the technical difficulties of increasing capacity and continual production process reduction. Somehow roadblocks are never unsurmountable, which means at a macro level, prices (both per GB and per IO) continue to fall at a steady rate and device capacities grow with increased reliability (by some measures, flash is more reliable than hard drives).
2015 will see flash arrays become the de-facto standard for tier 1 deployments, with fewer and fewer tier 1 HDD deployments being made. There are two reasons for this; firstly flash performance continues to increase (and cost drops, as already mentioned) while 15K HDDs have plateaued as a technology. Increased performance will only be derived from using flash. I also think the days of dynamic tiering are also over. It will simply become easier just to place applications entirely in flash as the cost differential between flash and high end HDDs continues to erode.
Latency reduction and flash presents a possible serious use case for technology like containers. Imagine using containers to execute packaged transactions that are spun up and run referencing data in NVDIMM & DRAM, removing the need to worry about paging, external I/O and other overheads. This is where the challenge for 2015 could be – writing software that can fully exploit ultra-low latency persistent storage (something already started with in-memory databases).
Managing Data Lakes
At the other end of the spectrum there’s the problem of storing and curating vast quantities of data. We’re already into storing mythical zettabytes of capacity per year and despite the hard drive manufacturers best efforts (with new techniques like shingled media), we’re apparently creating more data than we have the ability to store. Of course as an expression that’s totally meaningless; how can we not store what we create? There’s no magical ether where data sits before being committed to disk (unless you think of that as the network). What’s actually happening is data is continually being filtered and deleted, sometimes with good and probably sometimes with bad judgement.
So what are we to do? Unfortunately the more data we create, the higher the signal t0 noise ratio, that is, the less perceived (and possibly actual) value there is in the data being created. Even as costs decline, decisions still have to be made on what to throw away and what to keep. Data reduction technologies can help here; de-duplication needs to become ubiquitous and federated so data can be shipped around and stored based on metadata rather than the actual data.
The Architect’s View™
So here’s a list of challenges and opportunities to work on for 2015:
- Data packaging; storing data in units or forms that can be executed against and transported easily. We already see this with object stores and new database technologies. Some data (like audio/video media lends itself to this already – structured data is the challenge).
- Data mining; We haven’t seen an end to this yet. Increased compute power and better processing algorithms means we can revisit old data that we previously couldn’t derive value from – think of how the police are starting to solve cold cases by re-processing DNA that was previously too difficult to do. Part of the data mining challenge will be having data in a more usable format.
- Data indexing; looking at better ways to create more global standardisation for content (something already started with objects) that allows data to be read by multiple disparate applications.
The next few years will be all about managing the content, rather than storing the data.
Copyright (c) 2009-2021 – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.