
Alluxio Expands AI Platform with Faster Checkpointing and Multi-Tenant Support
SAN MATEO, Calif., May 21, 2025 — Alluxio has announced the release of Alluxio Enterprise AI 3.6, delivering advanced capabilities for model distribution, model training checkpoint writing optimization, and enhanced multi-tenancy support. This latest version enables organizations to dramatically accelerate AI model deployment cycles, reduce training time, and ensure seamless data access across cloud environments.
AI-driven organizations face increasing challenges as model sizes grow and inference infrastructures span multiple regions. Distributing large models from training to production environments introduces significant latency issues and escalating cloud costs, while lengthy checkpoint writing processes substantially slow down the model training cycle.
“We are excited to announce that we have extended our AI acceleration platform beyond model training to also accelerate and simplify the process of distributing AI models to production inference serving environments,” said Haoyuan (HY) Li, Founder and CEO of Alluxio. “By collaborating with customers at the forefront of AI, we continue to push the boundaries of what anyone thought possible just a year ago.”
Alluxio Enterprise AI version 3.6 includes the following key features:
- High-Performance Model Distribution: Alluxio Enterprise AI 3.6 leverages Alluxio Distributed Cache to accelerate model distribution workloads. By placing the cache in each region, model files need only be copied from the Model Repository to the Alluxio Distributed Cache once per region rather than once per server. Inference servers can then retrieve models directly from the cache, with further optimizations including local caching on inference servers and memory pool utilization. Benchmarks demonstrate impressive throughput with Alluxio AI Acceleration Platform achieving 32 GiB/s throughput, exceeding the 11.6 GiB/s available network capacity by 20 GiB/s.
- Fast Model Training Checkpoint Writing: Building on the CACHE_ONLY Write Mode introduced earlier, version 3.6 debuts the new ASYNC write mode, delivering up to 9GB/s write throughput in 100 Gbps network environments. This enhancement significantly reduces the time needed for model training checkpoints by writing to the Alluxio cache instead of directly to the underlying file system, avoiding network and storage bottlenecks. With ASYNC write mode, checkpoint files are written to the underlying file system asynchronously to optimize training performance.
- New Management Console: Alluxio 3.6 introduces a comprehensive web-based Management Console designed to enhance observability and simplify administration. The console displays key cluster information, including cache usage, coordinator and worker status, and critical metrics such as read/write throughput and cache hit rates. Administrators can also manage mount tables, configure quotas, set priority and TTL policies, submit cache jobs, and collect diagnostic information directly through the interface without command-line tools.
This release also introduces several enhancements to Alluxio administrators:
- Multi-Tenancy Support: This release brings robust multi-tenancy capabilities through seamless integration with Open Policy Agent (OPA). Administrators can now define fine-grained role-based access controls for multiple teams using a single, secure Alluxio cache.
- Multi-Availability Zone Failover Support: Alluxio Enterprise AI 3.6 adds support for data access failover in multi-Availability Zone architectures, ensuring high availability and stronger data access resilience.
- Virtual Path Support in FUSE: The new virtual path support allows users to define custom access paths to data resources, creating an abstraction layer that masks physical data locations in underlying storage systems.
Availability
Alluxio Enterprise AI version 3.6 is available for download here: https://www.alluxio.io/demo.
About Alluxio
Alluxio is a leading provider of accelerated data access platforms for AI workloads. Alluxio’s distributed caching layer accelerates AI and data-intensive workloads by enabling high-speed data access across diverse storage systems. By creating a global namespace, Alluxio unifies data from multiple sources—on-premises and in the cloud—into a single, logical view, eliminating the need for data duplication or complex data movement. Designed for scalability and performance, Alluxio brings data closer to compute frameworks like TensorFlow, PyTorch, and Spark, significantly reducing I/O bottlenecks and latency. Its intelligent caching, data locality optimization, and seamless integration with modern data platforms make it a powerful solution for teams building and scaling AI pipelines across hybrid and multi-cloud environments. Backed by leading investors, Alluxio powers technology, internet, financial services, and telecom companies, including 9 out of the top 10 internet companies globally.
Source: Alluxio