Cloud-Native Data Protection

Cloud-Native Data Protection

Chris EvansCloud, Cloud Storage, Data Protection

In my career, I’ve been involved in data protection across a range of technologies and platforms.  In larger environments, protecting data was a full-time job.  Many hours were dedicated to designing and implementing solutions, then managing the process of ensuring backups were being taken successfully to meet service level objectives. 

Moving data protection to the public cloud provides an opportunity to eliminate much of the effort required to keep backup infrastructure running.  How can IT organisations adopt cloud-native data protection and what benefits does it bring?

Growth, Costs & SLOs

In the storage discipline of IT, the standard mantras that define the challenges of managing physical storage infrastructure focus on dealing with growth, cutting or managing costs, all within expected service levels. 

Data volumes are always on an upward trajectory and as a result, the volume of backup data is closely correlated.  Of course, it’s not just a case of managing capacity growth in a backup environment, but also ensuring data can be moved across the network to the backup platform. 

When data, systems or an application needs to be recovered, the process needs to meet RTO and RPO service levels.  In my experience, keeping on top of infrastructure issues consumes as much, if not more time than administering backup and recovery. 

Architectures

Why is scaling backup infrastructure so hard?  Early data protection solutions ran directly on the same physical servers running the application.  Scaling meant managing many software installations, which became untenable.  Enterprise-class solutions of the last 20 years have used scale-out designs that incorporate a master/metadata server and one or more media servers or data movers.  Scalability is achieved by increasing data movers and/or the performance of the master server.

Virtualisation

The move to virtualisation vastly simplified the backup process.  Instead of managing connectivity to each physical server, data could be taken from the hypervisor backup API.  Backup solutions could implement data protection much more easily and in a more automated fashion, using policies that mapped to virtual machine tags or naming standards.  However, IT environments have evolved again.

A Hybrid World

One of the challenges for the enterprise is the move to a much more hybrid IT consumption model.  The walled garden of the data centre has gone, as companies move data to SaaS applications, have multiple endpoint and mobile devices and split infrastructure across multiple on-premises and public clouds.

Data is on the move and increasingly dispersed, so running backup simply from within the data centre no longer works.

Cloud as a Backup Destination

Why choose public cloud as the basis for data protection?  There are some pretty compelling reasons:

  • (almost) infinite scalability – to all intents and purposes, public cloud is infinitely scalable.  There is zero effort involved from the consumer in ensuring services are continuously available.
  • Demand-based model – cloud resources are delivered on-demand.  Crucially, this means they can be shut down when not required and the charging stops.  You don’t pay for what you’re not using.  In contrast, on-premises backup infrastructure needs to be scaled for the “high watermark” of demand and can sit idle for many hours a day.
  • Implicit Geo-capability – public cloud is deployed on every continent except Antarctica.  Wherever your business is based, there’s a cloud endpoint near you.  CSPs network their cloud data centres to provide a mesh that spans the globe.  This makes backup services and backup data universally available.
  • Value-Add – with access to many cloud services, data protection delivered out of the public cloud can take advantage of native services, either to improve service delivery or to deliver value-add options like analytics, ransomware detection and disaster recovery.  These kinds of services are being continuously developed and evolving.

Cloud-Native

When we say cloud-native data protection, exactly what do we mean?  It’s possible to take existing backup software and run it as a virtual instance in the public cloud.  Many vendors have taken that approach with primary storage.  However, that path risks not making the best use of cloud resources. 

The “most native” of the data protection vendors are those either intrinsically in the public cloud or using public cloud-native features to deliver their services.  This has much more value than simply spinning up a cloud instance with the backup software in it.  The backup provider’s costs can be much more closely aligned to the cost of delivering the service (at least from an infrastructure perspective) because components like storage, databases and on-demand VMs as data movers are all scaled on a demand basis.

Challenges

Of course, being totally cloud-native does present some issues.

First and most obvious is the cost of restoring data out of the cloud back to on-premises.  Cloud providers charge for data egress.  So, the question is, does your provider incorporate that cost into their charges or is this cost going to be absorbed elsewhere (such as by the business owner)?

Then there’s the question of cross-cloud support.  If a cloud-native backup vendor only supports a single cloud, some kind of workaround is necessary to support other cloud platforms.  Again, this can introduce cost, as the backup data would be treated as egress from the source application.  Cross cloud may also introduce a need for agents or proxy virtual instances, in order to extract incremental data and/or manage snapshots.

Then there’s the ever-thorny question of throughput.  Getting data into the public cloud platform will typically be via a full backup and forever incrementals.  Restoring entire applications will be a full VM restore.  Similarly, moving data between clouds will be as complex and not mitigated by solutions like shipping backup images on portable physical appliances. 

Finally, IT organisations that run from a small number of data centres may find it difficult to benefit fully from cloud-native protection, whereas companies with a more dispersed set of applications and locations will find it easier to bring data into the cloud in parallel.

Vendors

Which vendors are doing cloud-native?  Druva has a comprehensive cloud-native solution that protects a range of applications. 

  • inSync – endpoint protection (laptops, desktops, smartphones, tablets) and SaaS applications
  • Phoenix – Data centre applications
  • CloudRanger – AWS resources (virtual instances, databases)

Druva takes advantage of native AWS services such as S3, DynamoDB and EC2 to deliver a totally SaaS-based offering that incorporates additional costs like network egress charges.  You can find some follow up links here, plus Tech Field Day videos that discuss the Druva solutions.

NetApp Data Availability Services (NDAS) provides data protection for NetApp ONTAP platforms, either in public cloud or on-premises.  In a slightly different delivery model, NDAS runs within the cloud account of the customer, rather than being managed and run separately as a full SaaS offering.  You can find out more about NDAS from the following links (including Tech Field Day videos).

The Architect’s View

Backup feels like a logical process to be delivered as a service through the public cloud.  Everything we’ve said about scalability mitigates the classic issues of building a custom data protection solution using on-premises hardware. 

We’re not yet at the level of maturity where backup can be thought of as a true service like the ubiquity the S3 API has received.  Standards in this area would be useful and we still haven’t yet figured a way to make backup data truly portable between software solutions (or the vendors have chosen not to encourage this).

However, I predict that an increasing number of data protection vendors will push their solutions to work (almost) exclusively within the public cloud and then use that as a stepping stone to additional data management capabilities.  In ten years time, the idea of designing and installing backup software may seem very legacy indeed.  


No reproduction without permission. Post #70DB.