

Dremio today announced that the metadata catalog at the heart of its Apache Iceberg-based data lakehouse now supports other popular metadata catalog services, including Snowflake’s Apache Polaris-based catalog and Databricks Unity Catalog. The lakehouse provider says the move in its Project Nessie-based metadata catalog will bolster architectural flexibility in the cloud, on-prem, and everywhere in between.
Before metadata catalogs suddenly jumped into the big data consciousness earlier this year, Dremio had been quietly backing its own metadata catalog, dubbed Project Nessie, to provide the necessary housekeeping that a lakehouse based on Apache Iceberg tables requires.
So when Snowflake announced the open source Polaris metadata catalog during its user conference in early June, Dremio executives applauded the announcement and the openness that it could foster in the big data community. Seeing close alignment between Polaris and Nessie, which began development in 2020, Dremio executives pledged to work with the Polaris community to merge the two projects.
The Nessie-Polaris merger has yet to happen, but it is still in the plans. “Our goal is to merge the capabilities of Project Nessie into Apache Polaris (Incubating) to create a single, unified catalog,” says James Rowland-Jones, vice president of product at Dremio. “We believe this will become the default catalog for the open-source community. Dremio will continue to focus on seamless enterprise services built around it.”
In the meantime, Dremio is moving forward with development its own catalog service for technical metadata, dubbed the Dremio Enterprise Data Catalog. Specifically, Dremio today announced several new capabilities in the metadata catalog, which is based on Nessie.
The new bits include integration with the Snowflake metadata catalog service based on Apache Polaris as well as hooking into Unity Catalog, the metadata catalog that Databricks built for managing data stored in Delta Lake tables (Unity Catalog does quite a bit more, including lineage tracking, semantic modeling, security, governance, and functions as a regular, user-focused data catalog, but that’s another story).
Dremio’s move is noteworthy for a couple of reasons. For starters, with its acquisition of Iceberg maker Tabular for between $1 billion and $2 billion and its commitments to essentially merge the Delta Lake and Iceberg specs, Databricks helped to ease CFOs who were worried that they would pick the “wrong” format.
However, while Databricks committed earlier this year to supporting Iceberg tables with a future release of Unity Catalog, that support is not available yet. Dremio’s support for Unity Catalog ensures that Databricks customers who use its metadata catalog can achieve that interoperability with Polaris today.
“Flexibility is essential for modern organizations looking to maximize the value of their data,” said Tomer Shiran, Founder of Dremio. “With expanded Iceberg catalog support across all environments, Dremio empowers businesses to deploy their lakehouse architecture wherever it’s most effective. We’re 100% committed to giving customers the freedom to choose the best tools and infrastructure while reducing fears of vendor lock-in.”
Dremio’s product, which is officially called the Dremio Enterprise Data Catalog for Apache Iceberg, supports all Iceberg engines through the Iceberg REST API. In addition to supporting Dremio’s own SQL query engine, it supports other Iceberg-compatible query engines, including Apache Spark, Flink, and others.
Dremio’s catalog automates many of the housekeeping tasks that are required to keep an Iceber-based data lakehouse running at peak efficiency. That includes things like table optimization routines, such as compaction and garbage collection. It also provides “Git”-like branching and version control, enabling users to access data as it existed at particular moments in time (so-called “time travelling”). The catalog also provides centralized data governance and role-based access control (RBAC), ensuring fine-grained access to data and preventing user access to of sensitive data.
Kevin Petrie, vice president of research at BARC, says Dremio’s move helps enterprises deal with the “extraordinary pressure to access, prepare, and govern distributed datasets for consumption by analytics and AI applications.”
“To meet this demand, they need to catalog diverse data and metadata across data centers, regions, and clouds,” Petrie said in Dremio’s press release. “Dremio is taking a logical step to enable this with an open catalog that is based on Apache Iceberg, the emerging standard for flexible table formats, and by integrating with an ecosystem of popular platforms.”
Related Items:
Polaris Catalog, To Be Merged With Nessie, Now Available on GitHub
What the Big Fuss Over Table Formats and Metadata Catalogs Is All About
April 25, 2025
- Denodo Supports Real-Time Data Integration for Hospital Sant Joan de Déu Barcelona
- Redwood Expands Automation Platform with Introduction of Redwood Insights
- Datatonic Announces Acquisition of Syntio to Expand Global Services and Delivery Capabilities
April 24, 2025
- Dataiku Expands Platform with Tools to Build, Govern, and Monitor AI Agents at Scale
- Indicium Launches IndiMesh to Streamline Enterprise AI and Data Systems
- StorONE and Phison Unveil Storage Platform Designed for LLM Training and AI Workflows
- Dataminr Raises $100M to Accelerate Global Push for Real-Time AI Intelligence
- Elastic Announces General Availability of Elastic Cloud Serverless on Google Cloud Marketplace
- CNCF Announces Schedule for OpenTelemetry Community Day
- Thoughtworks Signs Global Strategic Collaboration Agreement with AWS
April 23, 2025
- Metomic Introduces AI Data Protection Solution Amid Rising Concerns Over Sensitive Data Exposure in AI Tools
- Astronomer Unveils Apache Airflow 3 to Power AI and Real-Time Data Workflows
- CNCF Announces OpenObservabilityCon North America
- Domino Wins $16.5M DOD Award to Power Navy AI Infrastructure for Mine Detection
- Endor Labs Raises $93M to Expand AI-Powered AppSec Platform
- Ocient Announces Close of Series B Extension Financing to Accelerate Solutions for Complex Data and AI Workloads
April 22, 2025
- O’Reilly Launches AI Codecon, New Virtual Conference Series on the Future of AI-Enabled Development
- Qlik Powers Alpha Auto Group’s Global Growth with Automotive-Focused Analytics
- Docker Extends AI Momentum with MCP Tools Built for Developers
- John Snow Labs Unveils End-to-End HCC Coding Solution at Healthcare NLP Summit
- PayPal Feeds the DL Beast with Huge Vault of Fraud Data
- OpenTelemetry Is Too Complicated, VictoriaMetrics Says
- Will Model Context Protocol (MCP) Become the Standard for Agentic AI?
- Thriving in the Second Wave of Big Data Modernization
- What Benchmarks Say About Agentic AI’s Coding Potential
- Google Cloud Preps for Agentic AI Era with ‘Ironwood’ TPU, New Models and Software
- Google Cloud Fleshes Out its Databases at Next 2025, with an Eye to AI
- Can We Learn to Live with AI Hallucinations?
- Monte Carlo Brings AI Agents Into the Data Observability Fold
- AI Today and Tomorrow Series #3: HPC and AI—When Worlds Converge/Collide
- More Features…
- Grafana’s Annual Report Uncovers Key Insights into the Future of Observability
- Google Cloud Cranks Up the Analytics at Next 2025
- New Intel CEO Lip-Bu Tan Promises Return to Engineering Innovation in Major Address
- AI One Emerges from Stealth to “End the Data Lake Era”
- SnapLogic Connects the Dots Between Agents, APIs, and Work AI
- Snowflake Bolsters Support for Apache Iceberg Tables
- GigaOM Report Highlights Top Performers in Unstructured Data Management for 2025
- Supabase’s $200M Raise Signals Big Ambitions
- New Benchmark for Real-Time Analytics Released by Timescale
- Big Data Career Notes for March 2025
- More News In Brief…
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- MinIO: Introducing Model Context Protocol Server for MinIO AIStor
- Dataiku Achieves AWS Generative AI Competency
- AMD Powers New Google Cloud C4D and H4D VMs with 5th Gen EPYC CPUs
- Prophecy Introduces Fully Governed Self-Service Data Preparation for Databricks SQL
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- CData Launches Microsoft Fabric Integration Accelerator
- MLCommons Releases New MLPerf Inference v5.0 Benchmark Results
- Opsera Raises $20M to Expand AI-Driven DevOps Platform
- GitLab Announces the General Availability of GitLab Duo with Amazon Q
- More This Just In…