Data replication is a core enterprise feature for implementing data protection. Synchronous replication provides the ability to maintain an identical copy of data at another location. While not a panacea for every problem, sync replication simplifies the protection process for servers and applications that are contained on one or more LUNs/volumes. INFINIDAT recently announced InfiniSync, a solution for delivering synchronous replication over infinite distance on the Infinibox platform.
If you need more detail about what that means, check out the discussion on synchronous replication on our recent mythbusters Storage Unpacked podcast. So assuming you need data to be identical across multiple sites, how do you cope with the latency issue?
Sync vs Async
Synchronous replication introduces latency because the host I/O can’t be acknowledged until updates have been completed on both local and remote arrays. The actual latency introduced by distance is dependent mainly on the equipment the traffic passes through. Speed-of-light is only a small component of the time. However, we can say that for today’s all-flash arrays looking to deliver sub-millisecond responses, sync replication can easily destroy the benefit of using flash (at least for write I/O).
- #39 – Garbage Collection: Storage Mythbusters Part I
- Pure Accelerate: FlashArray Gets Synchronous Replication
- VMAX – The Mainframe of Storage
The alternative is to use asynchronous replication, where the concurrency of the remote copy lags some time behind the primary copy. The actual amount of lag depends on the replication process, the network capacity between sites and the level of data updates. Replication can be based on snapshots and taken periodically or can be implemented as a continuous stream.
With async, the remote copy can be seconds or minutes out of date. During periods of heavier write I/O activity, the level of concurrency will decrease, but theoretically the remote site can catch up during quiet periods. Incidentally, one reason for using snapshot-based async occurs when the update rate means the remote copy can never catch up.
So, what happens if you want to have synchronous replication over distances that are only practical with async replication? This is where InfiniSync can help. First of all, let’s get one thing clear – InfiniSync can’t solve the speed of light problem. Latency is a fact of life. However, InfiniSync can spoof the problem and that’s achieved using technology from a company called Axxana.
Axxana was founded in 2005. I was first introduced to the company some 11 years ago in January 2007. The company’s Phoenix appliance is a fireproof black box (literally) that stores write I/O in the local data centre. Phoenix is effectively a cache of the data not yet replicated to the remote site. The cache is constantly refreshed as data is written by the application. Should the worst happen and a failover to the remote site be required, the Phoenix appliance provides the missing data to turn async into sync.
All manner of disasters could befall a primary data centre, including power outages, fire, flood or component failure. Phoenix was designed to withstand all of these scenarios and to provide access to data by in-built 4G mobile capability, even if network connectivity is down, .
Phoenix is a remarkably simple concept that gets around the problem of local write latency, delivering an RPO=0 experience and adding little or no impact to recovery time objectives. Of course, there is the practical consideration of implementation. A Phoenix appliance extracts data using a splitter/collector installed in the primary site. In the original implementation by Axxana, data had to be extracted from the appliance using a laptop or transferred over a wireless connection. Application at the remote site was then a manual process.
Disaster recovery needs to be simple, so any extra work needed to update a secondary copy can introduce risk into the process. With InfiniSync, the recovery process has been automated. This automation point is quite important. In a disaster, the appropriate personnel might not be available (or be caught up in the disaster). Documentation can be inaccurate and that introduces significant risk into ensuring applications fail over successfully.
Who actually needs sync replication these days? While we’re seeing some adoption of vVOLs, any virtual server implementation will find traditional LUNs cumbersome from a replication perspective. Synchronously replicating all data can be expensive, both in licence charges and networking, so it makes sense to limit usage to only those applications that need it. Typically, in most enterprises this is an ever decreasing subset of applications being deployed.
However, there are some scenarios where sync replication is absolutely essential. Banking and finance in general are the obvious use cases, but many enterprises have some single point of truth on which other applications depend. For mission critical data, I can see implementations that use sync locally in a metro configuration for performance/reliability, then perform a 3rd copy async with Axxana/InfiniSync as an RPO=0 option. This gives an extra layer of reliability compared to simply using async alone as the third copy.
The Architect’s View™
The idea of RPO=0, regardless of distance is a nice one, but I wonder if it keeps us within traditional or even legacy design models. Where possible, it makes sense to use eventual consistency and localisation of data, while having only a small set “single version of truth” applications. INFINIDAT is really aiming at the enterprise market where synchronous replication is part of the core application design. It may be seen to be cheaper to put in InfiniSync for greater distance protection than go to the expense of significant application rewrites.
With all that said, there is a market here and I’m sure Moshe & Co wouldn’t have built a feature without the customer demand for it. With full automation, InfiniSync provides an elegant solution to the impacts of latency, especially with low latency I/O requirements.
Copyright (c) 2007-2021 – Post #D223 – Brookend Ltd first published on https://www.architecting.it/blog, do not reproduce without permission.