This is the fourth post in a series looking at predictions for the storage industry in 2021. The first three posts are here:
- Storage Predictions for 2021 and Beyond (Part I – Media)
- Storage Predictions for 2021 and Beyond (Part II – Systems)
- Storage Predictions for 2021 and Beyond (Part III – SDS)
Container-attached storage (or CAS) offers a new paradigm in the way data is mapped to applications by using the container environment itself to deliver persistent storage. How widely can we expect CAS to be adopted, as containers and in particular Kubernetes, becomes a significant application delivery platform?
The containerisation of applications has changed rapidly in the last five years, with Kubernetes emerging as the leading container orchestration platform. When containers were first introduced, the industry assumed that persistent storage wasn’t required. Persistence was gained through application-based resiliency, including data replication and mirroring at the application layer.
As the container environment has evolved, traditional database software and other applications have been containerised, driving a demand for data that persists past the lifetime of a single container. This change was inevitable for many reasons.
Firstly, the use of application-based resiliency results in a considerable overhead that requires copying data around a container infrastructure and taking away from host-based I/O.
Second, many application platforms may not have native replication capabilities, putting data at risk if only a single image copy is maintained.
Third, enterprises need data persistence as part of compliance and audit requirements. Persistence at the storage layer provides the capability to implement data protection and security controls.
Initially, persistent storage was mapped to a container through volumes, LUNs or directories tied to the server running the container. This method was hugely inefficient and inflexible. Over time, the Container Storage Interface (CSI) has emerged as a standard approach that enables storage vendors to develop plugins for mapping storage to containers. It allows the container ecosystem itself to dynamically request storage through a process that obfuscates the platform-specific steps needed to provision a persistent volume.
Container attached storage or CAS is a software platform that provides storage for containers using the container ecosystem. An easy analogy that helps to explain CAS is to look at hyper-converged infrastructure or HCI. In an HCI environment, each server node runs either a dedicated virtual machine for storage or implements a scale-out storage layer within the hypervisor running on the node.
- What is the Container Storage Interface (CSI)?
- Should I Be Backing Up Containers?
- Block is Not the Solution for Persistent Container Storage
- Building Data Storage with Containers
In both models, each server has local storage that is mapped to a virtual machine, providing data protection (RAID or erasure coding), obfuscation from the hardware, and self-management. CAS operates like HCI storage by using local storage resources on each server in a container cluster to deliver storage as a set of containerised processes or micro-services.
Like HCI before it, CAS removes the need for a dedicated SAN, or at least the current incarnation of what we think of as shared storage. This is great if container platforms are delivered through virtual machines, as each VM can use attached storage (whether ultimately provided from a SAN or not). This storage is abstracted and divided into separate volumes with the CAS data plane.
On bare-metal environments, local disk resources are abstracted into container volumes, with the CAS software maintaining metadata and state information on how the physical storage capacity is divided up. At this point, the metadata store (typically an etcd or other key/value cluster) becomes critical. Most vendors recommend keeping the metadata store separate from the container cluster running applications.
What can we predict for CAS over the coming decade?
- Maturity – probably the most apparent evolution will be the development of new features and functionality. CAS solutions have a long way to go to come close to existing mature storage solutions, with many having gaps around data protection and other data services. CAS offerings also need to start taking advantage of new media like Persistent Memory.
- Data Mobility – the current crop of CAS solutions has yet to fully address the data mobility challenges needed from hybrid storage. For example, applications currently running on virtual machines may snapshot data to be analysed by container-based code. Today, building this kind of workflow is generally a manual process to establish.
- Security – CAS solutions haven’t addressed the long-term challenges of security. Like Fibre-Channel or iSCSI before it, security controls are weak or non-existent, with no real validation or audit. This is because the design of these protocols was based on emulating local disks across a secure network.
- Performance Management – CAS solutions need to offer more real-time performance analysis capabilities. Many offer dashboards to visualise volumes and recommend the use of Prometheus for performance metrics. Enterprises will expect features similar to what’s available with mature shared storage today.
Many of the challenges for CAS stem from the use of CSI, which is essentially emulating the storage attachment functions of Fibre Channel and iSCSI networks. The current design even echoes some of the mainframe SMS (Storage Management Subsystem) features first developed over 30 years ago. A big rethink in the way we map application data into containers is definitely needed. One solution, for example, could be the use of shared file systems or object stores rather than block devices (CSI already supports some of this today).
Unlike HCI, container clusters such as Kubernetes can be short-lived. This property represents a unique challenge in how long-term data retention is managed. One solution could be to merge traditional SAN with CAS. The SAN components provide resilience across multiple clusters, plus a location for metadata storage via a permanent metadata store.
SmartNICs and the disaggregation of storage is another area where CAS could find synergy. Instead of physical storage being installed within a server, it could be dynamically assigned across the network. This would make it easy to build dynamic bare-metal clusters with access to persistent application data stored elsewhere in the network. The SmartNIC software provides the security model that validates the use of data against a specific container cluster.
The Architect’s View
CAS is essentially a form of SDS, which as we discussed last time is coming to dominate the storage industry. The long-term future for CAS appears to be to act as an abstraction and mapping layer into a container ecosystem. Future success is likely to be pinned on providing data awareness rather than simply another attachment protocol. Container attached storage is definitely one area to keep watching over the next decade.
Copyright (c) 2007-2021 – Post #245b – Brookend Ltd, first published on https://www.architecting.it/blog, do not reproduce without permission.