Exploiting secondary data with NDAS from NetApp

Occasionally a technology leaps out at you, not for the current capabilities, but for potential future value. That’s a feeling I had when I heard about NDAS or NetApp Data Availability Services. At first glance, NDAS looks like nothing more than a simple backup-to-cloud tool, but dig into how the data is managed and you will see a whole lot more.

Backup for Generalists

NDAS is data protection for the generalist. With so many IT organisations operated by non-specialists, NetApp has seen an opportunity to create a solution that takes secondary data and replicates it to the public cloud. Why target the generalist market? Personally, I see a number of reasons. First, smaller organisations are unlikely to have dedicated data protection teams. Second, these organisations are also less likely to have complex backup requirements and be comfortable with cloud backup. Third, as we will discuss, smaller organisations still want the search capabilities needed for compliance, analytics, or simply determining other value from long-term data retention.

NDAS Overview

In the first instantiation, NDAS will provide the capability to move secondary snapshots (like those in a SnapVault) into an AWS environment that runs a mix of EC2 and S3 storage resources. EC2 virtual machines are used to run the NDAS web interface, as the data mover and for search capabilities. S3 provides the data store to hold all of the content.

To use NDAS, customers need to be running ONTAP 9.5 or higher on the secondary target of backup snapshots. Release 9.5 includes the NDAS Proxy and Copy-to-Cloud APIs that drive the software. Systems replicating into the secondary target don’t need to be at release 9.5. Note also that the secondary target could be a Cloud ONTAP instance (or ONTAP Select) if customers don’t have any secondary hardware.

NDAS is installed as an AWS AMI, which creates all of the services needed that run behind the scenes. The intention is that the backup user isn’t required to know the details of how the services are configured (although we will discuss that in a moment). The configuration process requires the customer to provide access to an S3 bucket. Both the software and the S3 bucket run in the customer’s own AWS account. This is not a SaaS service, but rather is software installed on public cloud for each user. You can see more about the installation and configuration process in the following video from Storage Field Day 18 in February 2018.

Search Capabilities

So great, we can do backups into the public cloud. This is a boon for customers that don’t want to run backup infrastructure. However, as the volume of data backed up grows, then the content also can be searched in useful ways. NDAS uses EC2 instances combined with local EBS software to run Elastic Search on content metadata. At this point in the product release, I’m not clear on whether this is native Amazon Elasticsearch or a self-deployed version running in EC2 instances (note to self to check). With the ability to run structured search, NetApp intends to offer a catalogue of additional services, that could, for example, run compliance checking or other analytics.

As an example of this, the following additional video from Storage Field Day 18 shows how data can be searched and indexed using Kibana to provide data visualisation over time.

The first part of the above video (from 8:43) shows the ability to run analytics against the data in an S3 bucket without the metadata search. In this instance, the content accessed directly using a software library called libC2C. Data stored in NDAS is self-describing and so can be read from the S3 bucket outside of the NDAS infrastructure.

For me, this self-describing feature is heavily undersold as a feature of NDAS. With any solution that is pushing data into the public cloud, there’s always a risk of proprietary format lock-in or having to re-hydrate data in order to access it. Imagine the issues of having a 1PB object store of backup data and no way to view it without rehydrating the entire contents….

More importantly, though, content secured through the NDAS platform can be indexed, searched and processed over time using native cloud features as long as the data is exposed through libC2C. This offers a significant capability that the public cloud is designed to exploit.

Why Public Cloud?

So why public cloud? The choice should be reasonably obvious. Public cloud infrastructure provides elastic capability to cope with data growth and the compute needs of search. Using the public cloud to aggregate data from many geographically distributed systems allows data to be consolidated into a single location for better search. This is the value of having metadata in a single place. Costs are easier to manage (scaling up and down), although this doesn’t necessarily mean cheaper in the long run. One aspect of public cloud that has to be remembered is that deploying to cloud isn’t necessarily cheaper, but can offer greater flexibility in cost management.

As I’ve already discussed, aggregating content into a public cloud bucket provides the capability to make use of native cloud search and analytics capabilities. There does, however, have to be some consideration given to how these costs are managed. Cloud provides an OpEx model, but costs can easily grow out of control. NDAS will benefit from cost constraint features, as well as some mechanism to charge back and monitor usage. This could be used, for example, for the IT department to provide analytics capabilities to the business on a “per-query” basis – analytics as a service.

Future Benefits & NKS

Where does NDAS go next? One clear benefit is a tie-in with the NetApp Data Fabric. We’ve recently talked about the capability to launch services from NKS (NetApp Kubernetes Service). We can imagine that in the future, NetApp will offer services through their Cloud Portal that will automate data analytics and other services. These could be triggered and run from NKS. It’s also possible to see a scenario where some customers don’t bother with the data protection aspects of NDAS and instead simply use the tool as a way of archiving into a data lake in S3. From there the data can be exploited through libC2C for advanced search and analytics. This could at some stage integrate with NetApp’a AI strategy (pure speculation by me). You can listen to some background on this with the following Storage Unpacked podcast.

Caveats

Of course, there are always a few caveats to consider when looking at new technology and solutions. In the first instance, the obvious issue is the lock-in to AWS and S3. Getting data out of S3 can be expensive, so the commitment needs to be well understood. With other platforms and solutions available, it will be interesting to see if and how NetApp expands to meet requirements on the likes of Azure and GCP, especially as both of these platforms run NetApp Cloud Volumes natively.

On that note, remember this is a standalone service, so data is not taken directly from NetApp Cloud Volumes. I’d expect to see some kind of capability to do this in the future if NetApp and the cloud service providers decide to provide APIs for accessing cloud-based snapshots.

Also, don’t forget issues around data encryption and storage efficiency (another note to self to rewatch the videos from SFD and see the answers to my own questions).

The Architect’s View

Although NDAS is being sold as a solution for the IT generalist, I actually think the long-term value for the product is in creating a secondary data source for search, analytics and other related tasks. We’ve seen a lot of money being invested in data management companies, where the perceived value is the ability to exploit that data. Backing up virtual machines is one thing, and there’s some value to indexing that data. However, structured data in databases on virtual machines is not a growth area. So, I see limited value to this information, compared to archiving and indexing data in unstructured file servers and object stores.

If NDAS search capabilities are extended to crack open VMs on ONTAP appliances, plus data from Cloud Volumes and object stores, then the collection could be a powerful search capability for across the enterprise. If NetApp also chooses to expose that data via APIs and drive the deployment of apps from the Cloud Portal, the combination could be a serious competitor to the likes of Rubrik and Cohesity, two start-up companies looking to control this part of the market.

You can find all of the Storage Field Day 18 videos for NetApp here, including a swansong appearance from Dave Hitz on his last official (paid) day at NetApp.

Disclaimer: NetApp is a client of Architecting IT and Brookend Ltd. I was personally invited to attend Storage Field Day 18, with the event teams covering my travel and accommodation costs. However I was not compensated for my time. I am not required to blog on any content; blog posts are not edited or reviewed by the presenters or the respective companies prior to publication.