Kubernetes data protection – container-integrated or separate backup?

As part of my work reviewing the small crop of container-based backup solutions, I’ve concluded that backup integrated within the Kubernetes cluster may not be the optimum design. Here’s why.

Background

The decision to apply data protection to containers has been an ongoing discussion for many years. As persistent storage within containers (and Kubernetes) has developed, it’s fair to say that for relatively long-running applications, persistent storage and data protection are essential.

In a perfect containerised world, every piece of code would be stored in a repository and version-controlled. Of course, the ideal world doesn’t exist, and changes occasionally get made dynamically. Therefore, protecting application definitions makes sense as a backup strategy, just in case the deployed code doesn’t match the repository. The application data itself is even more critical and should be protected on any platform, containerised or not.

Vendors including Portworx, Kasten and Trilio have spotted the requirement to provide container backups and have created solutions that implement data protection running containers within a Kubernetes cluster. There’s also open source Velero, previously known as Ark and developed by Heptio (acquired by VMware).

Basic

Most of the products above take a simple approach to data protection. The application data itself is protected through snapshots, either a native CSI provider, a vendor-specific platform provider or a snapshot plugin from solutions such as Portworx or Longhorn. The configuration of the application is protected by simply copying the application’s YAML definitions. Kasten also provides the capability to use sidecar containers called Kanisters that enable application-specific backup.

The concept of crash-consistent snapshots is rather basic and harks back to a data protection era from more than two decades ago. Across the industry, data protection has matured much more, with application-specific data protection now arguably table stakes.

Application-Focused

An application-focused backup means protecting an application based on the data, not the storage. Snapshots are an infrastructure and storage concept that have their place but are no substitute for proper application-based backup. Snapshots put limits on volume sizes, are generally focused on block-type devices and most obviously, lock the protection mechanism into the underlying storage layer.

As a result, backups that take advantage of native protection within a cloud provider, for example, would be tricky to move on-premises to or another cloud provider. We need portability for backups that isn’t dependent on the platform running the application.

Lightweight

There’s also a question to ask regarding the best strategy using Kubernetes clusters. The move to micro-services was meant to create an environment that moved application deployment away from pets (virtualisation) to cattle (containers). Now we risk making the pet the Kubernetes cluster.

In my experience, the benefits of Kubernetes arise from the ability to deploy applications rapidly into a lightweight cluster. As we add more overhead to the cluster in the form of data protection, storage, monitoring and visibility, the ability to quickly rebuild a cluster on-demand becomes increasingly more complex. Kubernetes doesn’t even come close to the level of maturity and stability seen in the virtualisation sector, where VMware rules supreme. So, cluster rebuilds are a fact of life and are more likely to occur where developers demand their own independent clusters for development.

Ask this simple question – if I rebuild my cluster, can I access the existing storage and backup resources in precisely the same way as before?

DPaaS

So, we have a choice; build data protection within the container ecosystem, or have it sit outside and protect the contents of a cluster. If we use a dedicated (and separate) data protection platform, then:

Backups are centrally managed and monitored, using a solution that has the choice of deployment as a platform/product or service (SaaS).
Data mobility is more straightforward, as the retention format is based on the backup platform, not the storage or application infrastructure.
Solutions are already significantly more mature than container-based backup, with application-focused capabilities.
Clusters can more easily be deployed and torn down without having to consider the implications of losing the data protection environment.

The last point on this list is crucial. From my research so far, the greatest challenge to the data protection of containers is managing the metadata associated with backups.

If losing a cluster means saying goodbye to existing metadata, then the data protection solution is of no use at all.

Alternatively, we push ahead with data protection built from containers. Now the DevOps team have yet another skill to learn; data protection and recovering from crash-consistent application snapshots. In small-scale testing, a solution built on Velero, for example, could be practical to protect development data but wouldn’t scale to the requirements of large enterprises. The management overhead of operating and supporting hundreds of individual backup deployments would be impossible. We fixed the storage sprawl with SANs in the late 1990s and built consolidated backup 30 years ago.

Hybrid

There is, perhaps, though, a middle ground here. Looking at another storage feature for a moment, we see that one of the benefits of container-attached storage is a greater awareness of the storage needs of an application. The same logic applies to data protection, where the specific application requirements could be delivered as code and implemented by an external backup solution. This configuration would create a hybrid architecture, which runs some functionality within the container ecosystem and some outside.

To see how this works, let’s consider the use of agents.

Agents

One aspect of “legacy” data protection that’s always been an issue is the use of agents. An agent runs on a host to provide additional capabilities and ensure the data protection process runs smoothly. Useful backup agents identify new data and add functionality like client-side de-duplication. In the virtualisation and container world, agents are seen as bad but were introduced into infrastructure under the guise of proxies.

In a container environment, agents or proxies could bridge traditional backup and container-based applications, enabling application discovery, application-focused data protection and user-defined metrics like RPO and RTO.

Today’s CSI development has given us storage classes and volume snapshot classes, but these aren’t mature enough for an application to specify data protection requirements. Instead, we need to have attributes applied to applications that specify the RTO/RPO requirements, the longevity of backup and the technique to take that backup.

Identifying the specific metrics and requirements of an application becomes the responsibility of a backup operator deployed into Kubernetes, with sidecar apps used to take application-specific backups. All of this data is then stored and managed by a long-living backup solution that could be deployed on-premises or (more likely) delivered as a service (SaaS) via the public cloud.

The Architect’s View™

Kubernetes and, previously, Docker are application deployment frameworks, just as server virtualisation was before. However, the transition from physical servers to x86 virtualisation, then containers and eventually serverless has created ecosystems that are increasingly lightweight and much more short-lived. When a new application architecture gains popularity, we’re not required to move every piece of the previous solutions into the new one. It would make no sense, for example, to create data protection solution out of serverless functions, although arguably it could be done.

Instead, we need to look at the most suitable platform for infrastructure and application components. Today, data protection works best as a solution independent from the framework or ecosystem in which it is deployed. I expect that in our multi-cloud world, SaaS-based backup will eventually be the dominant solution. Some of the strategies and reasoning for this are discussed in this post from 2019.

As that transition happens, Kubernetes data protection will evolve to be the proxy or agent for existing mature SaaS-based solutions such as those from HYCU and Commvault/Metallic, where support for multi-cloud and on-premises is baked in.

One last thought; if you have ever wondered why I believe that block-based storage is not the long-term solution for container-based applications (as I wrote in this post), you can start to see the challenges increasing as we talk about data protection. Snapshots and volumes lock applications into specific platforms. True portability will be achieved with storage independence and abstraction, which we’re still far away from achieving. There’s still quite a way to go in this market.