IBM FlashSystem Review - Part 2 - Software

This is the second of a series of three blog posts looking at the IBM FlashSystem shared storage platform. The three posts cover hardware, software and the operations and integration of the solution. IBM has sponsored this work and provided evaluation hardware for the project.

Update: 26 November 2021 to refresh upgraded software features for ransomware protection.

IBM FlashSystem Review – Part 1 – Hardware

Storage Operating System

Like many vendors, IBM uses a “storage operating system” to manage the operations of FlashSystem and in the latest models, that software is based on IBM’s SAN Volume Controller or SVC. From 2015 onwards, IBM rebranded SVC as Spectrum Virtualise to align with other solutions in the portfolio although we will use the names interchangeably here (as does IBM in documentation).

The initial design of SVC was to act as a storage virtualisation gateway between Fibre Channel storage and hosts. The virtualisation or abstraction functions allow dynamic reconfiguration of host-presented storage without impacting availability to hosts. For large enterprises, this capability enables physical storage to be added, removed or otherwise reconfigured without affecting host operations. In many instances SVC was used as a tool to restructure or reconfigure existing physical resources as part of a rationalisation or upgrade program.

The first implementation of SVC acted purely as a gateway, however in 2010 SVC was updated with the capability to use local disks and released as the Storwize V7000 platform. In 2018, IBM released the FlashSystem 9100 as the first to use SVC software with FlashCore modules (other FlashSystem solutions at the time used XIV code). In February 2020, IBM consolidated the Storwize and XIV-based FlashSystem products into a single family based on SVC. This now comprises the 5000, 7000 and 9000 models available today.

Standardisation

The storage industry has long debated the issue of standardisation on a single platform storage operating system. Many enterprise vendors have built up portfolios of products from multiple acquisitions or development streams. Each product has a specific niche or unique design aspect that fits part of the enterprise market.

While this approach has many benefits (including faster access to customers or a potential market), the use of multiple architectures can result in both confusion and additional management for end users. Standardisation on a single storage operating system provides the following benefits:

System administrators are offered a consistent look and feel that spans the GUI, API and CLI modes of operation. Processes, procedures and scripts developed for one product will work seamlessly on others in the family. Administrators don’t need to learn how to work with multiple solutions.
System design and layout is standardised. The design of storage platforms (where RAID and data protection remains important) is achieved in a consistent manner, whether installing from the entry-level systems to enterprise-class products. This means customers don’t have to redesign for branch or edge cases where the deployed solution may be smaller.
Standardisation offers predictability. As environments scale, customers want predictable performance and to know that migration to a larger piece of infrastructure will not introduce a whole new learning curve on platform management.

IBM has made the move to support a single storage operating system that supports entry-level SMB through midrange and enterprise customers.

As we discussed in part #1 of this series, the hardware options allow for a range of deployment scenarios. Some features of SVC are only available where the hardware is capable of implementing it (for example encryption, or data reduction pools). In all cases, SVC offers a consistent look and feel across the product range.

Storage Virtualisation

Before we dig into the details of Spectrum Virtualise, it’s worth remembering where the software came from. In the early 2000s, shared storage gained massive popularity, with Fibre Channel SANs leading the way in the enterprise. iSCSI was more successful in smaller organisations where the cost of FC couldn’t be justified. One big challenge for many customers was the ability to fully exploit assets in their data centre. Without rigorous management processes, it was easy for IT organisations to end up with a fragmented expanse of storage hardware that ultimately resulted in poor utilisation and uneven performance. SVC provided a virtualisation layer to abstract the details of the underlying hardware, either to reorganise resources or to use SVC as a method of avoiding forklift upgrades.

Many of the terms and definitions within SVC still reflect the ability to use external storage, even if that feature isn’t utilised (external virtualisation is available on the 5100 upwards). Customers still have the option to use external virtualisation as a consolidation tool. This feature can be particularly useful when moving from a mix of current hardware that is due for decommissioning.

SVC software runs on each node within a FlashSystem array enclosure. Changes to the configuration of a system are synchronised in memory between nodes as commands are executed through the GUI or CLI. Multiple enclosures may be joined together to form a cluster, using either IBM HyperSwap or standard FC clustering. We’ll return to this in a moment.

Feature Set

Storage features are implemented within Spectrum Virtualise, with each release introducing new or amended functionality. In blog post #1 we showed the features of FlashSystem in a summary table. Here we look at these features in more detail.

Safeguarded Copy (New in 2021)

The threat of ransomware is ever present in enterprise computing and persistent storage is the last line of defence to prevent the encryption and ransom of data. In July 2021, IBM introduced the Safeguarded Copy feature to FlashSystem as another tool to ensure business continuity.

Safeguarded Copy creates immutable snapshots that are not directly accessible from host systems. The immutability creates a logical air gap between primary data and backups, one that can’t be overridden by hackers or for that matter, disgruntled employees. In the event of an attack, compromised systems and data can be rolled back to a previously known good snapshot. As this process is effectively an update to metadata, the recovery occurs near instantaneously.

The configuration of Safeguarded Copy images is managed differently to snapshots, including a minimum time between copies. This feature prevents hackers invalidating backups by repeatedly taking new snapshots and rolling off known good images.

Naturally, we know that snapshots are not a true backup, as hardware failure would cause data loss. However, Safeguarded Copy provides administrators with a fast recovery option and so is useful for meeting Recovery Time Objectives in the event of a hacking attack.

Storage Insights

Storage Insights is IBM’s solution for cloud-based monitoring and management. The SaaS platform collects data from IBM FlashSystem deployments to provide a single rolled-up management view across all installations. Customers register for Storage Insights and permit IBM to upload telemetry and metadata into the Insights platform. Through the use of AI, IBM can provide customers pro-active management of their estate including the collective wisdom across the entire customer base. You can read more about the benefits of centralised telemetry in this post.

VMware and Red Hat OpenShift Integration

FlashSystem provides integration into common application and virtual infrastructure frameworks, including, for example, VMware vSphere through VAAI and vVOLs technology. Some configuration work is needed to exploit these integration points, including the deployment of local agents or proxy software.

3-Site Replication

Data replication is typically achieved between two locations, either synchronously within short distances or asynchronously over a wider area. A common solution for enhanced replication protection involves using 3-site replication where data is replicated to two remote sites that have sufficient metadata to determine the differences between the two sites if the primary array is lost. This makes it easy to re-establish a consistent replication relationship between the surviving FlashSystem arrays with minimal data replication.

Local and Remote Replication

FlashSystem supports local replication through FlashCopy, which uses a copy-on-write (CoW) mechanism to protect replicated data. FlashCopy images can be either snapshots, where only metadata is duplicated until data is written, or clones, where data is physically copied as a background process. Clones provide the benefit of isolated performance and protection against data loss in RAID group failures (but not system loss).

Metro Mirror provides synchronous replication between FlashSystem arrays and is generally used for short-distance protection, such as two data centres up to 300km apart. Global Mirror provides asynchronous replication over distance, which is effectively unlimited but doesn’t offer full RPO=0 (recovery point objective) resiliency. Both solutions use either Fibre Channel or IP for inter-system connectivity and can be configured in a range of different scenarios.

Easy Tier

Easy Tier delivers data placement across multiple tiers of physical storage. Hybrid systems initially used flash storage for faster workloads and hard drives for capacity. Easy Tier provides the capability to balance workloads across a mix of all storage types within FlashSystem, either NVMe SSDs (including SCM), HDDs or SAS drives. Extent size is determined at storage pool creation time and can be between 16-8192MiB.

Transparent Data Migration

TDM is a feature of Spectrum Virtualise that exploits the storage virtualisation functionality to enable the import of data from external storage and within internal storage capacity. Internal disks are grouped into MDisks, a RAID set of similar disk characteristics. MDisks and external storage can then be grouped into a storage pool, either for presentation to hosts or as part of a migration process.

Data Reduction Pools

A Data Reduction Pool or DRP is a storage pool against which specific data reduction technologies such as compression or deduplication may be applied. Data Reduction Pools can be built from external storage LUNs, which provides a way to implement data reduction techniques on storage systems that don’t offer those features natively.

HyperSwap

HyperSwap is the capability to use multiple FlashSystem arrays as a single logical image using ALUA, either locally within the same data centre or across a campus location. Both systems in a HyperSwap high availability (HA) pair store an independent copy of data, updating each copy synchronously before confirming I/O success to a connected host.

The Architect’s View™

The capability of the storage operating system continues to be essential in defining the success of shared storage. Although we shouldn’t need to know, implementation details are still important because media and hardware have specific characteristics that are echoed in the capabilities of software. Consistency is a “must have” so it’s pleasing to see IBM finally standardising on a single storage O/S and single product range (except for mainframe). SVC is a mature technology that fits the requirements well.

In the next post we will look at some of the specific operational aspects of creating volumes and managing the hardware. This is based on hands-on experience with a FlashSystem 5030 appliance in the Architecting IT lab.