Google announces new storage features for GCP at Next 2023

At Next 2023, Google announced three new storage solutions for the Google Cloud Platform. We take a look at the details and put the news into context with the broader cloud and on-premises storage ecosystems.

Background

Google Cloud Platform (GCP) already has standard storage offerings that cover object (Cloud Storage), file (Filestore) and block (Persistent Disk and Local SSD) protocols. The Next announcements extend file-based support with Google Cloud NetApp Volumes, formalise the Cloud Storage FUSE offering with first-party support and introduce Parallelstore in private preview.

Google Cloud NetApp Volumes

We covered Google Cloud NetApp Volumes (GCNV) in a separate post. To summarise, this is a porting of the NetApp ONTAP storage operating system to run as a first-party solution in Google Cloud. This rounds out NetApp’s deployment across all the major clouds. For Google, this solution provides enhanced file storage capabilities, but crucially, enables an on-ramp for traditional enterprise customers of NetApp to move data from on-premises into Google Cloud. Read more on this announcement in this related post.

Cloud Storage FUSE

Cloud Storage FUSE is an implementation of the open-source FUSE (Filesystem in Userspace) solution for Linux which enables anyone to develop a userspace-based file system. It is similar in implementation to Mountpoint for S3, announced as an alpha release by AWS in March 2023.

The idea of FUSE is to make it easy to create a file system that can be customised in terms of the underlying data and storage structures, without the need to resort to kernel programming. We used FUSE to develop a proof of concept during the development of Mobilus. Other implementations build file systems in memory or read and write remotely accessed data across a network.

In this instance, Google has developed FUSE to read and write data from Google Cloud Storage object store. The technology enables AI (and other) applications such as Tensorflow and Pytorch to access object data as if it was stored in a file system. The benefit of this approach is that it removes any need to refactor data, but it does have restrictions. FUSE can’t layer strict POSIX support onto object storage, so some compromises must be accepted. However, as a gateway, the technology provides an elegant solution for multi-protocol support where strict POSIX compliance isn’t necessary.

FUSE isn’t known for high performance, so we’ll be interested to see what service level guarantees Google is prepared to offer for the solution.

Parallelstore

Parallelstore is a new parallel file system based on the open-source Intel DAOS project. DAOS (Distributed Asynchronous Object Storage) was developed to exploit new technologies such as NVMe SSDs and Optane persistent memory. Google is using DAOS to build a scale-out, high-performance file system for AI and HPC workloads.

As the technology is in private preview, we don’t have many details, other than some high-level performance numbers. Google is claiming latency figures around 300 microseconds, with millions of IOPS throughput and 200MB/s per TB read capability.

Parallelstore will be one to watch as we see how the solution is implemented and deployed.

The Architect’s View®

Data storage is starting to get much more attention in the public cloud, as the cloud service providers evolve their platforms for new application types, such as AI. Earlier this month, AWS announced new storage features (see this post), while all the public clouds have accepted a hybrid model as the future, with integrations from NetApp, Microsoft, Lustre and OpenZFS.

The on-premises storage vendors have started porting products to the cloud – Dell has APEX Block Storage, Pure Storage has Cloud Block Store, Infinidat has InfuzeOS, although IBM and HPE currently have nothing to offer in this space. Then there are the new software-defined solutions using public cloud instances to build virtual storage arrays. These include Cloud Block Store (mentioned already), Portworx, Volumez, Lightbits Labs, Silk and more.

It’s great to see data storage having a new focus in the public cloud, although we will start to see greater complexity, more fragmentation and further additional work required to pick the best solutions. We have this covered and will be producing more content over the next few months that reviews and addresses this expanding section of the market.