Unstructured data is increasingly vital to businesses as a source of content deployed in a wide range of use cases, many based around ML/AI. This information typically ends up in object stores or on file servers. Data accessed through NFS and SMB reduces the need to code to object storage APIs and is obviously attractive to businesses. Unfortunately, scaling file server performance isn’t easy. InfiniteIO, a start-up based out of Austin, Texas, believes they have the solution – metadata acceleration.
Building File Systems
File systems are notoriously hard to build. As a result, we’re still mostly using solutions developed over 20 years ago. NFS (Network File System) was initially developed at Sun Microsystems in 1984. Arguably the most commonly used version, NFSv3 was first published in 1995 (RFC1813). However, SNIA claims that NFSv4 is gaining in adoption (more on that later).
Anyone who has done file system development will know that network file system protocols are “chatty”. There’s a lot of communication back and forth between client and server. POSIX standards dictate a well-known interface for file system communication (open/close, stat, mount, and so on). If you’d like to learn more about file system semantics, FUSE (File System in User Space) is a good starting point.
NFS clients spend a significant amount of time querying the state of a file system and accessing the metadata describing files and directory structures. InfiniteIO claims that 90% of all file system I/O requests are metadata queries. As metadata requests are an integral part of accessing file content, then speeding up these requests results in improved performance. This is the essence of the InfiniteIO offering.
Of course, file caching as a concept is nothing new. In the 2000s we had File Area Networks, and there are other solutions like Avere Systems (now part of Microsoft) and Infinio that attempt to accelerate performance by caching the files themselves.
Inevitably, the challenge with any file caching solution is keeping data in sync between the cache and the backing store. Write caches also need redundancy protection in case of hardware or software failure. When metadata requests form the majority of file server requests and are a tiny proportion of the data itself, why bother caching the file at all?
The InfiniteIO Application Accelerator is an appliance-based solution that focuses on caching and accelerating metadata requests. The platform is deployed as a three-node cluster and sits on the network in line between NFS clients and the storage. All of the metadata on the NAS backing store is cached in DRAM, providing sub-100µs latency and up to 3.2 million requests per second.
[Side Note: a traditional cache will store only a portion of data, typically the most active. Application Accelerator keeps a copy of all of the metadata, a 100% mirror of on-disk metadata.]
The interesting part of the Application Accelerator technology is the process in which caching occurs. The InfiniteIO appliance sits in front of the storage platform and performs deep-packet inspection on traffic to and from the client, responding only to metadata requests. In effect, the appliance answers on behalf of the storage, taking metadata I/O requests away from the system and allowing it to concentrate on delivering data to applications.
When the backing storage processes a request, the returning record is also intercepted, enabling the Application Accelerator to update metadata in DRAM to reflect the change in filesystem status. This process eliminates the need to continually re-scan file systems to keep metadata accurate.
[Side Note: all I/O has to pass through the Application Accelerators or cache could become stale, so this does put a restriction on how the solution is implemented.]
Obviously, the InfiniteIO accelerator isn’t aimed at traditional file servers. Humans are usually OK with I/O response time delays in the milliseconds. But when you move to high-performance analytics, where files are accessed repeatedly, reductions in latency have a significant effect. InfiniteIO quotes figures of 40-60% run-time improvements with no changes other than implementing the solution in-line with existing storage.
We’ve discussed the value of metadata in many previous blog posts. Metadata holds the value in a file system as it represents the knowledge of data activity and data placement. With access to control the metadata, Application Accelerator provides the capability to tier files between multiple platforms.
The most apparent use-case here is to move data to and from the public cloud and archive inactive data from primary storage. However, owning the metadata stream provides InfiniteIO not just information like last accessed time, but more detailed metrics, such as frequency of access, and the access profile. It would be possible, for example, to identify files that are more randomly accessed, compared to those that are written or read serially.
This deeper level of metrics provides the opportunity to optimise the placement of data much more efficiently. Placement algorithms could be used to provide much better data mobility, prefetching applications as they move between data centres or clouds.
The Architect’s View
I’ve been interested in the challenges of enabling data mobility for several years (and you can find many of my posts that discuss the topic). Mobility can take many forms, whether it is within a single cluster of storage, within a data centre or across geographic locations. So, the promise of Application Accelerator looks good. However, I wonder if there’s a long-term future for a stand-alone product.
- Datrium delivers data mobility with Automatrix
- Technology Choices for Data Mobility in Hybrid Cloud
- Hybrid Cloud and Data Mobility
Here’s the reasoning behind that statement. Firstly, Application Accelerator only delivers value for NFSv3 deployments. There’s no support for NFSv4 or SMB. Some of the shortcomings of NFSv3 are addressed in NFSv4, especially with pNFS. The question here is whether those protocols are advanced enough to provide the same value as a dedicated appliance.
Second, Application Accelerator seems like a great fit for a vendor looking to improve system performance. Integrating the technology into an existing file server platform would provide that vendor with a great leap ahead of the competition.
In the short-term, I’m sure there are hundreds, if not thousands of customers that could benefit from putting Application Accelerator in front of existing deployments and avoiding considerable upgrade costs. The long-term future is perhaps as a data mobility enabler in an existing vendor or public cloud platform, helping to make data placement as transparent as application mobility is today.
Wherever the technology goes, solutions like Application Accelerator are critical in enabling the efficient hybrid and multi-clouds of the future.
Post #4766. Copyright (c) 2007-2020 Brookend Ltd. No reproduction in whole or part without permission.