The implementation of IT is becoming more diverse as companies look to take advantage of on-premises and public cloud application deployments. As a result, we’re less focused on the underlying implementation of hardware infrastructure and much more aware of data and applications. Data protection needs to reflect this approach and move towards more application-aware implementations. Let’s explain why with a little bit of history.
A Brief History of Data Protection
Looking back 20-30 years, we can see that data protection has moved through a number of iterations. In the days of mainframe, we backed up files. Although data sat on volumes, predominantly the access method was file, reading and writing what were then known as datasets.
As client/server took hold, we moved to a model with agents on each server. The agent worked on behalf of the backup software to funnel data into the backup platform, either as a series of files or with application-awareness via APIs offered by (for example) database vendors.
An interesting bifurcation occurs here in the sense that agent-based backups didn’t protect the entire O/S or server directly. Instead, we had bare-metal restores or would simply rebuild the server from scratch. Bear this model in mind for later discussion.
As virtualisation took hold, the physical server transmuted into a combination of software and data files. Backups (mostly) moved to snapshots provided by the hypervisor. Now the unit of backup/restore was the virtual machine, with the capability to dig into the contents and retrieve individual files if required.
As we move to hybrid cloud, data protection again becomes focused on data and not the encapsulating container. The easy approach here is to simply take snapshots of data. However, snapshots (as we will discuss) don’t provide a complete solution, especially with the abstraction of cloud.
A Word About Snapshots
Before moving on, it’s worth covering the concepts of snapshots and the applicability to data protection. We’ve all heard the adage that “snapshots are not backups” and in truth this is partially true. The reality is a bit more nuanced. Snapshots typically reside within the same physical storage platform as the source data and share unchanged blocks of data. A failure in the storage system loses both the original data and the snapshots. As a result, either the snapshot or entire additional copies of data need to be moved to separate physical infrastructure and/or a different physical location. Snapshots can still act as a backup for those instances that need instant restores of accidentally deleted or corrupt data.
The problem with snapshots though is that they are inherently hardware or platform focused. Most snapshot implementations work at the volume or LUN level and subdivide content into fixed blocks. Snapshots represent a mix of original and changed blocks over time, but don’t align to either a file system or application.
Snapshots take point-in-time images of physical storage and don’t align to transactions. A snapshot represents a time-slice of the data on a LUN or volume and not the end of a completed transaction. This is the case even with application-consistent snapshots that simply flush application buffers or suspend I/O until the snapshot is taken.
In public cloud, each cloud service provider will implement snapshots based on their storage technology. As a result, backups taken from snapshots will be hardware-dependent and inconsistent across how data is retained. This means it won’t be easy (or likely possible) to move snapshots from one to another public cloud provider, for example.
Of course, we shouldn’t malign snapshots too much. At the end of the day, snapshots are a tool for taking backups, rather than an entire solution.
A New World
The evolution of technology, especially driven by public cloud, is introducing new ways to run applications. The “traditional” models of servers with operating systems (either physical or virtual) is being used alongside containers and server-less applications. Although this might be seen as the move towards micro-services, the trend has been there for some time. The “run it all” mainframe operations have gradually made way for multiple applications on mid-range platforms to running a single application on individual virtual machines.
- Rubrik, Cohesity and the battle for NoSQL Backup
- Data Protection in a Multi-Cloud World
- The Need for APIs in Storage and Data Protection
- Rethinking Data Protection in Public Cloud
All of this change means we’re less dependent on the application packaging and more on the data and application itself. As an example, we can now run Microsoft SQL Server on Windows or Linux. Databases like MySQL and MariaDB can be run on multiple operating systems or as containers. Serverless allows application code to respond to data workflows more easily.
The result is that it makes less and less sense to back up data as virtual machines or entire servers. A VM that is based on a hypervisor backup from VMware vSphere or Microsoft Hyper-V requires transformation and manipulation to use in public cloud. Similarly, it’s not easy to take a virtual instance from public cloud and use that backup elsewhere.
As IT organisations adopt hybrid strategies, there will be an increasing separation between data and applications. An application instance can be created on demand (just think of infrastructure as code) as long as there is a definition for it. At this point, as long as all of the configuration details of a VM are known, then there’s no point backing the VM up.
So, why not start to separate applications and data and only back up what we need to secure? This approach will be more practical in the long run, as we choose the form of the application from a range of available options (and locations to run it).
Of course, this means our backup solution needs to be application focused. We’re going to need to use application-based data protection interfaces or store the application data in a structured format that allows it to be backed up and manipulated with an understanding of the content. Logically, this means putting data onto a file system.
For backups that are application-focused, what are the requirements?
- Transactional granularity. Backups should be able to backup/restore at the transaction level (note this is more complex than it seems and could be a post in its entirety).
- Integrated Credentials. The backup system needs to understand security credentials for processes taking and restoring backups rather than having an “god-mode” backup user.
- App-aligned policies. Data protection based on policies applied to the application wherever it resides and simply implemented by the underlying data protection tool.
- Abstraction. Less dependency on the hardware platform, unless as a process for obtaining data (e.g. hardware-based snapshots or data APIs).
- Encryption-aligned. Backup solutions need to manage encrypted data at the application level, as in the long-term, this will be a better place to secure data than the hardware.
- Platform Independence. Be capable of backup/restore from the application whether deployed on a range of operating systems, the public cloud or as a container.
Some of these features exist today, some are definitely on the wish-list. Extending further, we could add the following to future requirements:
- Cross-application backup/restore – interchangeability between similar solutions (e.g. backing up and restoring between different SQL databases)
- Backup data mining – true searching of structured content, in addition to the process of unstructured mining that exists today.
Again, some of these requirements are partially implemented in the industry today.
The Architect’s View
Ultimately, the aim of any BC/DR strategy is to keep the company in business. More than ever, this means IT systems need to be available 24/7 or at least to the service level the business expects.
Server virtualisation and public cloud has introduced the idea of abstraction from the underlying hardware, even if it isn’t fully implemented. If we want to start on a journey that transforms the deployment of applications to just code and data, then we need to free ourselves of the encumbrances of physical constructs. Data protection at the application level is one major step to reaching that goal and should be one we strive for with each iteration of our data protection strategy.
No reproduction without permission. Post #78DA.