Rethinking Data Protection in Public Cloud

This is the first of a series of posts that will start to dig down into the issues of data protection within public clouds, This post covers some outline thinking. There’s more information on data protection on our dedicated Microsite – https://www.architecting.it/microsites/kb-data-protection/

The most apt (and slightly tongue-in-cheek) definition of public cloud is that it is simply somebody else’s computer. You’re renting time and services from a provider, services which may or may not offer some form of data protection. Typically, public cloud vendors, including the hyper-scalers, will offer an SLA that covers restoration of a failed service, back to the point of failure. Service restoration doesn’t include the cases where data loss occurs for other reasons, like application corruption or fat fingering. This means there’s a need to provide data protection in public cloud. But how does or should that differ from how things were done on-premises?

What Do the Services Providers Offer?

When we talk about data protection, the requirements for the cloud cover all services, usually IaaS and SaaS. If we look at vendor service level agreements, the definitions are extremely careful to imply best efforts and base the service definition on system unavailability (EC2 SLA – S3 SLA). Data loss doesn’t get mentioned. Even if there is an outage, restitution is through service credits only. This means if your data is valuable to you, don’t expect your losses to be covered by the service provider. Perhaps the best way is to assume nothing. Assume the service provider will be delivering the bare minimum and will not recover your data in any event other than hardware failure.

Abstraction

As we dig deeper, the first thing to highlight is that visibility of the hardware used in public cloud is abstracted away from the user. This is generally a good thing. Service providers (SPs) can make the right decisions on how to deliver their services, including the choice of hardware. Over time, hardware becomes cheaper for the same amount of compute. This allows the SP to replace technology and offer a cheaper service or make more margin, without the impact of hardware specific deployments.

Unfortunately, data protection tends to be highly integrated with the hardware layer. Storage array-based replication is directly hardware related and vendor specific. Today’s modern backup systems take feeds of changed data from the hypervisor. Neither of these options are available in public cloud. This means different processes are required.

Issues to Resolve

So what’s different from the way we’ve done backups in the past? Here are a few thoughts.

You can only use the APIs and data services the service provider offers. By this we mean that you won’t have access to the underlying infrastructure. We’ve already mentioned this point, but it’s worth highlighting again. There’s no direct access to the storage platform. Service providers may not offer backup APIs.
Data exists independently of the application. Think about how we’re moving to using containers for application deployment. Where data used to exist within a VM instance, as we move to the cloud, data won’t be tightly bound with the application. Instead it may exist as an object store, a file system or across a number of containers running a scale-out application. As a result, we can’t use the constructs like VM or instance name for restore or even long-term archive, because that instance may have been accessing data stored elsewhere.
Applications are portable. If we really reach multi-cloud, applications may run across cloud at the same time or be moved dynamically to meet demand. This means the restore point could change from one day to the next. How will you know where to restore from? What happens if the restore is from a different cloud?
Cloud providers implement their technologies in different ways. A virtual instance from AWS won’t natively run in Azure. Does it make sense, therefore to backup an instance, when it’s the data that matters? Are you separating application data and configuration information so an application can be rebuilt automatically in a new location?
Network traffic is expensive. Moving data out of a cloud provider results in an egress charge. Data movement between clouds needs to be minimised to save on costs.
De-dupe in the cloud doesn’t exist. If you’re used to moving data to a de-duplication appliance for long-term retention, then be aware that there is no de-dupe of data sitting on cloud storage. Service providers will charge you full cost.
Where will your backup go? Do you want to put your backup data on the same infrastructure as the source? This is not a good idea and is generally avoided in traditional backup environments.

It becomes clear quite quickly that using traditional backup methods won’t work. Data protection needs to work more with the data, not the application container.

The Architect’s View™

This post aims to start a discussion on some of the challenges of data protection for the public cloud, rather than post a list of answers. We’ll get into the answers on future posts.

Moving to a more distributed world introduces additional challenges, because platforms a naturally different from each other. Because of this, we can see that focusing on the data is important and backups of the applications start to have less and less value.