This week’s Data Unpacked podcast episode is a conversation with Floyd Christofferson from Hammerspace. Among other things, in this recording we discuss the architectural choice of where the abstraction point should be placed, when implementing a data mobility solution. We see three choices, all with relative strengths and weaknesses.
Data mobility will be a defining issue in the hybrid cloud. Moving data around distributed infrastructure to ensure it is in “the right place at the right time” is a challenge to achieve, without continually replicating the content. Replication introduces forks in the road, allowing two copies of the same data set to diverge. As copies of copies are taken, so consistency is lost, as well as increasing the physical volume of data stored.
We see three options for maintaining data mobility, whether to optimise usage or to place data next to wherever compute will be run.
- Build an abstraction layer. In this instance, a distributed file system or object store provides access to data across many locations and with either eventual or strong consistency. Strong consistency is generally preferred but comes at the cost of additional complexity and performance challenges.
- Implement efficient migrations. Move entire data sets between physical storage resources using tools that can execute the copy process in the background and minimise downtime. Migration solutions can also be used to copy data, as long as strict controls on distribution are maintained.
- Move data tactically. Perform individual file migrations, leaving a pointer behind (stubs or links) to track the new physical location of the data.
Hammerspace has taken the first option, with an abstraction layer that assimilates existing storage resources. Crucially, the Hammerspace solution doesn’t restack the data into a proprietary format but leaves existing storage intact. This means the Hammerspace abstraction layer can be removed at any point in the future.
Side note: Weka also implements a scale-out file system with the capability to archive to S3 object storage. This can be used as a migration tool.
Efficient migration technology (such as that developed by Datadobi) can be a powerful tool for hardware refresh, data migration into and out of the public cloud (subject to checking egress charges) and to archive data once it is no longer actively needed. The key feature in the data mover engine that powers DobiMigrate and StorageMAP, for example, is the ability to move data in the background and only take a small outage to manage the cutover process.
A third option is to migrate individual files or directories, leaving behind a pointer to the new location of the data. When the moved content is accessed, a process is required to either recover the content or forward the request to the new location. Symbolic links are one way to achieve this. The Linux operating system (for example) knows how to follow the link and access the data. Komprise is one company that implements the pointer-based solution using a data mover engine to access migrated content.
Pros & Cons
Each of these three options has benefits and disadvantages. Building in an entirely new abstraction layer requires design and deployment of software (and possibly hardware) across each location in which the data will be accessed. However, the work is a one-time process that has many benefits further down the line.
Data migration between physical locations doesn’t address the copy proliferation issues but is simpler to implement and a good technique for managing hardware refresh projects. Data mobility in this technique could be combined with a solution such as Microsoft DFS to create a virtual file system layer.
The use of pointers must be viewed with caution, as file system scans or data migrations can cause unnecessary data movement or performance issues. It’s therefore important to ensure these types of solutions have mitigations or workarounds to manage file system scans.
The Architect’s View®
All the above solutions could be used within the enterprise to deliver data mobility. It’s important to look at the implications of each one and decide how they apply to your own requirements. Given a choice, I would implement a distributed file system, as the additional up-front work results in many long-term benefits (as we describe in the podcast). As part of a long-term hybrid strategy, a data plane that spans all computing locations will be a must-have feature.
We discussed the three companies mentioned in this post in a previous data migration discussion which can be found here.
Copyright (c) 2007-2020 Brookend Limited. No reproduction without permission in part or whole. Post #334e.