Gaps in Cloud Native Data Protection

One of the up-front expectations of public cloud infrastructure is that the technology won’t be as resilient as on-premises systems. Users need to design in a level of resiliency themselves, catering for both software and hardware failures. If the cloud fails, we expect that public cloud service providers (CSPs) will do no more than ensure data is backed up to recover to the point of a systems failure.

CSPs have started introducing backup-as-a-service offerings, bringing “cloud-native” protection to common services like databases and virtual instances. Although it may seem like a smart move to use integrated backup, these solutions may not offer the same level of protection as platform-independent products and may prove to be a poor long-term strategy.

Defining Backup

I don’t want to get into a protracted discussion on what backup should entail, as this is a topic we’ve discussed many times before. Check out some of the links in this post for more background and of course our dedicated micro-site on data protection. However, suffice it to say that at a minimum, data protection should be separate from existing infrastructure to provide an “air gap”. This ensures that if a system fails, it doesn’t take the backup data with it. We also expect backup to cover for non-system failures such as user errors, software errors (data corruption) and malicious damage.

Cloud-Native Offerings

Looking at Google, AWS and Microsoft, each offers some kind of data protection integrated into public cloud services.

Amazon Web Services

AWS Backup provides native support for backup of EBS volumes (effectively EC2 instances), EFS, RDS, DynamoDB and the Storage Gateway. RDS, EBS and the Storage Gateway are protected by simple snapshots that are offloaded to S3 object storage. EFS and DynamoDB are natively protected application-based backups.

Backup provides centralisation and automation of backup features based on policy. The offering can also be driven by CLI and API, in order to be built into existing workflows. There’s also the added benefit of integration with AWS encryption services, SNS (for notification of backup success/failure) and CloudTrail to consolidate backup activity into an audit trail.

However, AWS backup isn’t yet mature. There’s lack of support across all regions and backups only work within the scope of a single region. There is no inherent de-duplication, so backup of (for example) 100 virtual instances will see no saving from de-duplication across the instances (although the snapshots themselves are space-efficient). Most importantly, because snapshots are native AWS images, data can’t easily be exported and re-used elsewhere, making the backup data tied to the AWS platform.

Microsoft Azure

Azure Backup offers data protection for Azure virtual instances and SQL Server running in Azure VMs. Protection of Azure Files is currently in preview. Azure provides the capability to enable backup through the Azure Portal, via PowerShell or CLI. Data protected under Azure Backup has the capability to be replicated to a secondary region using Geo-Redundant Storage (GRS).

Unfortunately, Azure Backup has some significant limitations. Only one scheduled backup is permitted per day (although four ad-hoc backups can be performed and three can be scheduled using an additional MARS agent). No automatic adjustments are made for daylight saving, which implies the backup is being triggered from within the VM itself, rather than a central scheduling service.

File-level VM backups are only supported on Windows and with the deployment of an additional agent. The movement of snapshot data into a backup vault can take hours to complete. During this time, failure on the source platform could result in the inability to restore data. Note, this is an issue common to AWS too. In on-premises backup software, offload of snapshots is a critical checkpoint in ensuring recovery is possible.

Google Cloud Platform

Google Cloud only offers protection for databases (MySQL and PostgreSQL). There’s no native protection for virtual instances, other than self-scheduling simple snapshots. Database backups are application-consistent (via a quiesce) but are always taken as full images and not space-efficient. Backup images are not retained by default once a database is deleted, so customers have to remember to save them before deleting a database.

Enterprise Class

The services offered by the cloud providers are nowhere near what would be expected in enterprise data centres. There’s a lack of automated scheduling and policy definitions, although AWS looks to be leading in this area. Backup offerings are all implemented differently, with no consistency of service across the cloud service providers.

Probably the biggest concern with using native services is that backups are not portable outside of that platform. This makes it impossible to recover data from one service to another without rehydrating into its original form. The use of snapshots is a particular issue here because the backup format is essentially hardware-dependent and based on the construct of a LUN or volume.

Marketplace

Of course, as a supplement to native services, CSPs already offer application marketplaces to deploy and install backup software. In one respect, this is a great workaround to the lack of native data protection features. However, using these services comes with a potential compromise. Most legacy on-premises data protection software was designed around the principle of a fast, local network and user-based agents. Although this model can work in the public cloud, it’s far from ideal and introduces all of the challenges previously seen with managing individual client backup configurations.

It would be far more practical to take data from virtual instance snapshots. This would, at least, provide efficient backups that could be file-consistent. Application-consistent backups would need additional integration work to achieve.

Ultimately, the best solution is to allow third-party backup software to integrate natively into cloud platforms. We’ve seen this kind of solution already and it could be a model for data protection too.

Cloud File Services

We’ve reported extensively on the integration of NetApp ONTAP into the public cloud, specifically with Azure (Azure NetApp Files) and Google. In addition, Elastifile recently integrated with Google Cloud, eventually being acquired by the company.

Azure NetApp Files goes a step further than simply running storage within a virtual instance. The service is integrated into Azure APIs and wholly managed by Microsoft and NetApp while offering the look and feel of a native service.

This is where the evolution of native cloud data protection needs move. The question is exactly how the transformation should occur.

API or Integration

There are two choices. CSPs could partner and integrate natively, in the way existing file services have been added to cloud platforms. Alternatively, the CSPs could expose APIs that allow (approved) third-party vendors access to the underlying storage software underpinning virtual instances and applications. Nutanix recently introduced Mine to do exactly this.

Providing access via API has risks. The cloud providers might end up exposing some of the shortcomings of their storage implementations. APIs might have to be rate-limited, to prevent excessive snapshot and backup processes affecting primary storage performance. So, this route would need some consideration.

Long-Term Planning

What does this say for long-term use of backup data? If multi-cloud is a future consideration, then going with native services is inherently limiting. The same applies if application choice needs to be flexible (e.g. running an application under both containers and virtual instances).

If backup data will be used for search, analytics or compliance, then native services won’t be suitable and definitely don’t expose search capabilities that could span multiple clouds, on-premises and SaaS services. Getting data protection strategy right is important because these “value-add” services reduce overall IT costs and can themselves be used to sell services back to the business.

The Architect’s View

For many potential cloud users, native data protection services may be good enough for their needs. I’m sure that the existing services will develop and mature over time.

However, as cloud usage grows, matures and becomes inevitably more complex, long-term data retention and use becomes more critical to the business. As a result, putting the right framework for data protection in from day one could save many CTO from buyer’s remorse further down the line.

My view is that primary data and data protection should be implemented as separate services from the public cloud itself. This provides the independence and mobility needed to enjoy the scalability and flexibility the public cloud offers, while retaining service choice and (as much as is practically possible), avoiding platform lock-in.