
AI Training Gets a Boost with NetApp StorageGRID Update

(cybrain/Shutterstock)
Keeping AI models fed with data has become a challenge as the size of data and the size of models both get bigger. One company hoping to keep customers on the right side of this colossal curve is NetApp, which yesterday unveiled an update to its StorageGRID object store system that it says brings up to a 20x boost in throughput for AI training workloads.
StorageGRID is NetApp’s S3-compatible object storage system that’s used to store large amounts (think tens of petabytes to exabytes) of unstructured data for big data, advanced analytics, and AI workloads. The object store can be paired with NetApp’s ONTAP data management software to create a unified, software-defined storage infrastructure that works across clouds and on-prem, including NetApp’s traditional NAS devices.
Reaching across data silos to fetch data is one thing, but being able to deliver the right piece of data to the processor at the right time is something else. Object stores aren’t usually known for speed and performance, but considering the petabytes and exabytes that customers are storing these days, it’s the only type of system that meets the scale needs.
Vishnu Vardhan, senior director of product management for NetApp, explains how the company delivered a throughput boost in StorageGRID 12.0.
“Fast access to object storage is clearly a need in the new world of AI, and NetApp is committed to helping you achieve it,” Vardhan wrote in a September 9 blog post. “To this end, StorageGRID implementation has evolved to an inner ring and an outer ring architecture.”
StorageGRID’s inner ring is designed for high speed and low latency, while the outer ring favors high capacity, high throughput, and high availability. The inner ring can be connected to a specific GPU cluster and deliver “near-line-rate performance,” Vardhan writes, while the outer ring can be connected to multiple GPU clusters simultaneously.
While caching systems are complex to deploy and hurt data integration, they bring benefits that overcome these disadvantages. With StorageGRID 12.0, NetApp is introducing a new caching layer that’s designed to improve how data flows within the product.
According to Vardhan, the new caching layer delivers up to 10 times the performance of current NetApp StorageGRID appliances. “This performance can be further scaled up by running the caching layer on a bare-metal StorageGRID node, enabling you to customize the server to meet your specific needs,” he writes. This, ostensibly, is how NetApp got to the 20x figure it cited in the announcement.
This release also brings capacity increases. Customers can now support up to 600 billion objects, which is double the previous limit. Solid state clusters can now supports 122TB QLC drives, which doubles the capacity and density of StorageGRID deployments, and also boosts performance.
In addition to the performance boost, the exa-scale object store upgrade is slated to bring additional benefits for AI workloads, including support for branching buckets and fast cloning of data. NetApp says this will boost testing and development workflows, thereby enabling customers to more quickly iterate their AI projects.
The branching buckets feature will allow developers to make instant copies of large buckets containing billions of objects and petabytes of capacity, operate on these buckets independently of each other, and reconcile changes between buckets, Vardhan says. These S3 buckets can be created nearly instantly and take up no additional space, he says.
“One of the long-standing axioms in AI/ML is that ‘changing anything changes everything,’” Vardhan writes. “That’s why data can be even more critical than code in the realm of AI. And while there are well-established mechanisms to version code, it’s much harder to version data. Either existing tools don’t scale, they change the data format, or they change the way that applications are expected to interact with storage.”
Admins will appreciate the improvement to StorageGRID’s logging capabilities, as well as the capability to automate drive firmware updates across all nodes, which should simplify maintenance tasks. StorageGRID 12.0 also brings security updates, including support for AES GCM encryption, integrity checking, and default blocking for SSH ports.
Related Items:
Data Management Will Be Key for AI Success in 2025, Studies Say
NetApp Spots a Data Platform Opportunity in the Cloud