Managed Databases Require Managed Backup

Managed database platforms are becoming increasingly significant to cloud service providers as a way to deliver value-add and sticky services. The cloud service provider runs the database on behalf of the customer, which also generates a requirement for data protection in some form. How should databases be backed up and restored in the public cloud?

Database as a Service

Almost two years ago, we talked about how the database-as-a-service market was starting to heat up in the public cloud. The idea of using the underlying infrastructure (virtual instances, storage) on which to build new services is entirely logical. With the complexity of deploying solutions such as SAP HANA, having a validated and certified configuration that is supported by SAP and the cloud provider is essential.

Wisdom

Running a SaaS database offers more than just the capability to select certified configurations. Like most of IT, the ability to deliver resilient and highly available solutions is dependent on platform knowledge that includes both hardware and software.

Virtustream, for example, makes a virtue (no pun intended) of the accumulated skills and knowledge the company can provide to build out SAP solutions in the cloud. There’s a lot of logic behind the idea of trusting complex technology solutions to experts that deploy and manage these platforms every day.

In many IT organisations, technical skills get accumulated for those members of staff to then move on elsewhere. Unless the job description requires documentation of specific platform knowledge, IT organisations have to resort to re-training or buying in the right skills. Even if internal staff are well trained, they’re unlikely to have seen the breadth and depth of technical solutions encountered by a service provider like Virtustream.

Service Obfuscation

Naturally, anything running “as a service” will be designed and implemented by the service provider against a set of published metrics. Using the mobile phone analogy, the end-user doesn’t care how many cell towers there are, or how the back-haul to the network is implemented. Instead, service availability is critical, as is 3G/4G/5G throughput and bandwidth.

The DBaaS market is just the same. Customers care about the volume of transactions that can be processed, along with the latency and response time of those transactions. These are the most apparent parameters of service delivery; however, there are more nuanced ones. Customers will care about software version releases and upgrade schedules, about backup and restore SLAs and what happens in the instance of a hardware failure.

Instapaper

In 2017, I posted an article that discussed a failure in the AWS MySQL RDS instance running Instapaper, a popular bookmarking service. Due to several overlapping issues, a full file system caused the database running the application to stop working. After a multi-day outage and significant behind the scenes assistance from AWS, the problem was eventually resolved.

What this outage demonstrated was the importance of vendor transparency in terms of alerts, monitoring and other issues that might affect a service the customer cannot see into. Equally, if we’re using a complex (and generally business-critical) platform like SAP, we want to know the configured environment is fully supported – including backup.

Data Protection as a Service

At this point, we reach an interesting conundrum. If the point of outsourcing the database platform and application was to gain the benefit of vendor experience and knowledge, shouldn’t we be doing the same for data protection? After all, the vendor has direct access to the infrastructure and the ability to take efficient backups. Shouldn’t we demand and expect integrated data protection with our DBaaS?

The challenge here is precisely what is backed up and how easy those backups are to use. For example, AWS automatically protects RDS instances with either snapshots or automated backups. Snapshots are user-initiated and represent the entire database. This design is probably because the entire snapshot is just a crash-consistent copy of the underlying storage. Restoring from snapshots requires recovery of the whole database, which can be time-consuming and expensive if you have multi-terabyte databases.

The alternative is to use automated backups, which are a combination of snapshots and transaction logs. Unfortunately, these are retained for a maximum of 35 days and deleted when a database instance is deleted, so users will require another method of protection if ongoing application activity needs to be retained past the life of any individual database instance.

Independence

I’ve been a constant advocate for separating primary data and secondary data from the infrastructure of the public cloud. The public cloud is a great location for running applications and doing data analytics. For organisations of any real scale or complexity, a multi-cloud world is the way we are headed, with multiple public and private clouds in the mix, including SaaS and IaaS.

IT organisations that are currently “all in” on platforms like AWS might well be focusing on the benefits of a single IaaS provider. However, as time goes by, and we see maturing adoption of cloud, no single vendor will offer a complete and lifetime solution for a single business. Just as the mainframe gave way to client-server, virtualisation and now containers, so the public cloud will be part of the mix of an infrastructure that will build a virtual cloud from many infrastructure and SaaS offerings.

Who You Gonna Call?

When it comes to data protection, I believe that we need to take an independent stance and treat backups the same way as primary data. Backups need to be portable and infrastructure-independent. This approach applies equally to platform applications like SAP and the myriad offerings based around commercial and open-source databases.

The big question is, which vendors are offering the best solutions and who’s embedded in with the cloud platform to ensure backup/restore works the most efficiently? Here’s a quick checklist to help you think in the right direction:

Portability – will my backups be tied to my cloud platform?
Granularity – can I backup/restore at the right level of detail?
Longevity – can I keep enough backup images to satisfy regulatory or operational requirements?
Integration – can I see backups integrated with my existing dashboards and tools? Does the cloud vendor allow integration at the right level of the infrastructure?
Support – does the platform (cloud) and application (database) vendor fully support and certify the solution?

That last point is probably most important. When it comes to databases like SAP or Oracle, vendor buy-in is essential. Without this, you will have no backup or support when things go wrong, and that’s not the way to run mission-critical services.

The Architect’s View®

Secondary data should be as independent as primary data. Modern data protection must provide vendor-backed support for the protection of business-critical infrastructure components. You can learn more about what constitutes modern data protection in our recent paper, available to purchase here: