During a recent outage on Amazon Web Services in part of the US-East-1 region, some customers experienced what appeared to be the total loss of their data, according to a report in The Register. This isn’t the first time AWS has experienced an outage and these kinds of problems aren’t isolated purely to Amazon. We should remember that data protection of our data is our responsibility, whether in the public cloud or not.
Backup on Request
Looking back over my career, one particular incident (some 30 years’ ago) stands out as a good reason to always take backups. At the time, physical mainframe storage volumes were unprotected (no RAID) and around 3GB in capacity. A small number of volumes were assigned to a development team who expressly didn’t want backup implemented. With tape capacities at less than 1GB, putting a backup regime in place for a single volume represented a significant expense. So, the team decided they would be OK with no protection. Three months into the project, I was asked if we had backups for the data because the team had inadvertently deleted a substantial amount of their work. Of course, with no backups, there was no way to recover the data, despite trying some low-level media recovery tricks. Sometimes you learn the hard way.
In modern IT departments, data protection is a key aspect of data management. No-one would consider not backing data up. In fact, enterprises actively look to demonstrate backup (and recovery) success as part of compliance and regulatory requirements. This includes having tools and processes in place to identify new applications and place them automatically under the backup regime.[As a side note, I think data protection software should offer API-based collation of backup success/failure, that can be rolled up into higher-level management dashboards. Today, too many products still require manual checking or screen-scraping of results.]
Why do we protect data as a matter of course? Putting aside the implications of not following compliance, it’s a fact of life that “hardware will eventually fail, and software will eventually work”. No data is guaranteed to be 100% safe, even if hardware and software were 100% reliable because users make mistakes, hackers want to steal or encrypt your data and data centre issues occur (like the AWS issue above) that take systems out of action.
Unfortunately, technology is inherently unreliable, and people make mistakes. So data needs to be protected, wherever it lives.
Why would we think anything different about data stored in the public cloud compared to our on-premises applications? There are countless stories of data corruption and loss in the public cloud and cloud-based services. Part of the problem is perhaps one of expectations.
- Learning from the Instapaper Outage
- The Risk of Shared Service Level Agreements
- OK, Google – Where are my Docs?
- Should We Worry About the S3 Outage?
Running IaaS applications in the public cloud is a mirror of what happens on-premises. It’s pretty easy to quantify the risks if a virtual instance or cloud-based database is lost. We can easily see exactly how an application is impacted too.
The situation with SaaS-based services is a bit more nuanced. We all use dozens of online software applications, such as Office-365, G-Suite, Salesforce, Slack and Trello, just to name a few. Because the implementation of these services is obscured from us, it’s easy to think that the service provider has all the tools in place to protect our data. That’s not always (or perhaps even rarely) the case. Most of the time, the service provider will have SLAs in place that have the objective of returning service back to the point of failure. However, even this can go wrong, despite best intentions.[Another side note: In fairness to the service providers, why should they keep endless copies of your data, with visibility on whether you will ever need to recover from it? With potentially millions of customers, the additional overhead would make their services uncompetitive]
What lessons can we learn from the way public cloud works, in light of failures we’ve seen?
- Cloud Services are not infallible, they are usually running the same or similar software as enterprise data centres and so are subject to the same risks of failure and hacking. As we’ve already said, software has inherent bugs and issues that can impact your data.
- Cloud Services are changeable. Service providers like to roll out changes and enhancements on a daily basis. There’s much more change occurring than in a traditional enterprise and with no customer-facing change process.
- Cloud Service providers won’t recover your data except to meet the SLA of the service you use. This means data will be recovered to the point of failure but won’t be available to recover from user or application issues.
- Public cloud providers won’t compensate you for the real business impact of losing your data. If they did, at the first outage they would be out of business. Instead, you will be offered service credits, which barely compensate for any loss.
There are some exceptions to the above points. For example, Salesforce will attempt to recover data for a fixed $10,000 fee. However, this isn’t a restoration of data, but delivery of a copy from which you have to rebuild (hopefully) your data again. This isn’t guaranteed to work and may cost a whole lot more than $10,000 in total.
Most of the public cloud providers offer some form of backup. At present these services are not mature and in any case, backup is directed onto the same infrastructure as the primary copy of your data.
What should you do to avoid being impacted by a public cloud failure?
- Take your own backups. Don’t rely on the cloud platform for protection, even with SaaS.
- Use platform-independent software. Native backup services are OK, but you’re still tied to that platform for recovery. Better to use a separate solution that can provide native backup support.
- Write your data outside of the platform. Consider writing to another region, or even better, another service provider (although this will be more costly). Make the backup format as independent as possible.
The best backup solutions will be those that are software-based, create independent backups and can be easily scaled up and down.
The Architect’s View
Data loss can easily be avoided. Don’t get caught in the fallacy that cloud services don’t need backup. Make sure you protect your business and build backups into your public cloud strategy.
Copyright (c) 2019 Brookend Ltd. No reproduction in part or whole without permission. Post #F5DC.