VAST Data Announces the VAST Data Platform

VAST Data recently announced the VAST Data Platform, a new solution based on the initial constructs of Universal Storage. What does this platform offer, and what market positioning does it represent for VAST Data?

Background

VAST Data launched in 2019 with a new scale-out architecture for unstructured data that exploits the features of cheap QLC NAND flash storage and persistent memory. The design, called DASE or disaggregated shared everything, is essentially a massive key-value store for physical chunks of information in which data and metadata exist in equal measure.

You can learn more about Universal Storage, now called the VAST DataStore, in these two podcasts we recorded early on in the evolution of the company.

The early marketing strategy addressed the issues of storage economics and killing off the hard drive in particular.

Over time, that message has evolved as the original goals of naming a company “VAST Data” (and not “VAST Storage”) continue to emerge. We documented some thoughts in a blog post at the end of November 2021 (find it here), where the initial thinking focused on new technology and how it might improve Universal Storage.

Data Processing

One area that the 2021 blog post did touch on was the idea of distributed processing, in particular, a degree of autonomous processing derived from simply ingesting new content. It’s a topic we’ve come back to repeatedly, for example, in the end comments on this post talking about the WEKA Data Platform, this one talking about S3 Object Lambda and this podcast and blog post discussing Hammerspace.

The value of a self-driven system is easy to see. Imagine a database of medical records, for example, such as X-rays, where AI algorithms process incoming data. Those algorithms could be refined to improve current data ingestion but also to reprocess historical data to determine missed anomalies. The possibilities across a wide range of industries are endless.

Data Catalog

Back in February 2023, VAST Data announced the VAST Data Catalog. This feature forms the foundation of metadata extensibility, creating a system that provides extensible metadata for each piece of ingested content. The metadata can be queried using SQL-style expressions, similar to the features of Amazon Athena.

An efficient searchable metadata store is a powerful feature for extending the value of data content. We all know how easy it can be to create and retain huge quantities of data, only struggling a few years later to remember exactly what it represents. End users can’t tag data manually in any practical way, so tools and automated processes are required.

DataBase

Imagine extending the Data Catalog to store more complex content as a fully featured database using the fundamentals of the Data Catalog as the query engine. This is what VAST Data has delivered with the DataStore. Unlike traditional OLTP databases, data is stored in a more efficient columnar format that facilitates high-speed and massive data queries. The implementation of the VAST DataBase stores data in a similar way to Apache Parquet but with much finer granularity.

The underlying architecture of the VAST DataStore enables data to be added into the VAST DataBase in an efficient manner and one that enables equally efficient future processing.

DataEngine

Bringing all of the components together will be the VAST DataEngine (due in 2024), a set of event triggers and functions similar to AWS Lambda. VAST customers can then build workflow into the platform, providing that degree of automation we discussed earlier. Some simple built-in examples of functions include metadata scraping, PII detection and ransomware detection. The majority of functionality will come from customers, the owners of the data, who know how they want to process data as it flows into the platform. More on this in a moment.

The Architect’s View®

There’s much more to digest about the new DataPlatform than we’ve covered in this brief blog post. However, what’s clear is that there is a direction of travel for VAST Data that takes the company away from the storage world and into more direct competition with the likes of Snowflake and Databricks. VAST Data is building an ecosystem where the underlying hardware solution is designed to deliver the best and most efficient use of technology to drive performance and scalability.

If we look at the basis of Snowflake’s business, the company has a charging model based on credits and storage capacity. Compute within the infrastructure is charged on a per-second basis, aligned to the cost structure of the public clouds on which the solution runs. The bigger and more expansive the query, the greater the cost to the customer.

VAST Data offers an alternative where customers can purchase a licence for the platform and then grow the infrastructure over time, including the compute component. The company will be hoping that the underlying DASE architecture will provide a cost and performance differentiation compared to using a cloud-based solution.

As businesses transform towards much greater use of AI and analytics technology, the cost of “what if” queries and the equivalent BAU processes will become an increasing part of the cost base of IT and the cost of doing business in general.

However, we think that cost optimisation and efficiency are only one part of the story. VAST now talks about the DataSpace, multiple clusters configured to share data globally. This aspect of the technology could be the technical differentiator for many businesses. It introduces the capability to store and process data in a federated manner, with VAST Cloud instances storing data in the public cloud and providing access to cloud-based GPUs.

This federated design provides the ability to optimise costs, scale quickly, process data at the edge, and retain sovereignty of data in private data centres. This makes the VAST Data Platform a genuine platform for data processing, with a unique architecture that will be hard to replicate in either the public cloud or on-premises alone.