

(greenbutterfly/Shutterstock)
If you’re a supporter of open data, it’s hard not to feel good about last week’s news around Apache Iceberg. Customers demanded an open storage format, and the two leading providers, Snowflake and Databricks, are delivering it, in a big way.
To recap: Databricks surprised the big data community last Tuesday by throwing its weight behind Apache Iceberg with the announcement of its intent to acquire Tabular, which was founded by former Netflix engineers who created Iceberg.
That announcement came a day after Snowflake unveiled Polaris, a new metadata catalog designed to work with Iceberg, thereby enabling customers to use open query engines with their data. The move furthered Snowflake’s transition from a proudly proprietary cloud data warehouse into an open data platform for analytics and AI.
Members of the open data ecosystem responded with applause. Among the biggest supporters is Dremio, which develops an open-source query engine of the same name, is the main backer for an open metadata catalog, Project Nessie, and also manages an Iceberg-based lakehouse for customers.
“I think it’s a statement that, in table formats, Iceberg won. I think it’s the realization of that,” said James Rowland-Jones (JRJ), Dremio’s vice president of product management. “It’s also the realization that table format bifurcation, when you are not winning, is not helpful to your business.”
Databricks’ table format, called Delta, was the most-used table format when Dremio surveyed customers on their lakehouse technologies in late 2023. While Delta was number one in terms of total deployments, Iceberg was the leader in terms of planned deployments over the next three years, said Read Maloney, Dremio’s chief marketing officer.
“Who’s driving these changes? It’s customers. Customers are sick of being locked-in, and the only way to do that is to ensure that you’re not only in an open table format, but then you have an open catalog,” Maloney told Datanami in an interview at Snowflake’s Data Cloud Summit in San Francisco last week.
“So now customers own their own storage, they own their own data, they own their own metadata, and then all the vendors in the ecosystem build around that. And the customer now has the ability to say ‘I want that vendor for this, I want that vendor for this,’ and they all work within the common ecosystem,” he says. “The more there’s commonality in the specification around the catalogs, it makes it way easier for everyone to get involved in the ecosystem.”
“We’re listening to customers,” Ron Ortluff, the head of data lake and iceberg at Snowflake, told Datanami in an interview last week. “That’s kind of the guiding principle.”
The pending launch of Polaris, which Snowflake plans to donate to the open source community within 90 days, means that Snowflake customers soon will be able to query their Iceberg data using any query engine that supports Iceberg’s REST-based API. That list includes Apache Spark, Apache Flink, Presto, Trino, and (soon) Dremio. And of course, they will also be able to query Iceberg data using Snowflake’s fast proprietary SQL engine.
The momentum behind open data is sign of the continued decoupling of compute stacks, said Siva Padisetty, the CTO for New Relic, which develops an observability platform.
“After storage and compute became decoupled, all of the layers from storage through analytics began to be similarly unbundled, a process currently taking place with tables,” Padisetty said via email. “Overall, the focus here remains on data stack optimization and how organizations assemble the appropriate storage, table format, and compute engines to process their data use cases in the fastest possible manner.”
The key, Padisetty says, “is maintaining vendor unlock, speed, and agility across compute and storage while solving business use cases in the most cost-effective manner with the gravity of data without multiple copies.”
The value of having a centralized data platform that can handle huge data volumes and maintain performance and security for multiple use cases, such as IT telemetry, data lake, and SQL analytics is paramount, he said.
“Enterprises get the value add of open-source technology while maintaining centralized data,” Padisetty continued. “The centralization of the use cases is going to happen, and companies should be positioning themselves to address that.”
The folks at Starburst, the commercial outfit behind the open source Trino, are also watching the Iceberg developments closely. Iceberg was originally developed in part to enable Netflix to use Presto, which Trino forked from, so the growth of Iceberg is definitely a positive one.
“The benefit to the market and customers is that this competition actually creates openness,” said Justin Borgman, the CEO and chairman of Starburst, which also offers an Iceberg-based lakehouse service. “Starburst is one such beneficiary and can now be considered a strong third option in the Databricks vs. Snowflake debate.”
Borgman is closely watching what comes next, particularly around the metadata catalog. Just as the battle over open table formats ended up being a new source of data silo-ization (which is ironic, since they were created to foster open data), the metadata catalogs are also a potential source of lock-in, as they broker connections between processing engines and the data.
“With Tabular, Databricks’s Unity catalog has the potential to capture a lot more market share, including organizations using either Delta Lake or Iceberg,” Borgman told Datanami via email. “Snowflake’s open-sourcing of Polaris is a way to compete against Databricks by highlighting that while the market is rapidly moving to open storage formats like Iceberg, catalogs like Unity are a new source of lock-in. One could speculate that this will pressure Databricks to eventually open source Unity, but it is too early to know for sure.”
Taken as a whole, however, the news of the past week is very good for customers and supporters of open data. Momentum for open data platforms is building, and it couldn’t come at a better time.
“The Iceberg ecosystem has been growing quickly. I think it’s going to grow even faster on the back of both of these announcements,” Maloney said. “If you’re in the Iceberg community, this is go time in terms of entering the next era.”
Related Items:
What the Big Fuss Over Table Formats and Metadata Catalogs Is All About
Databricks Nabs Iceberg-Maker Tabular to Spawn Table Uniformity
Snowflake Embraces Open Data with Polaris Catalog
July 29, 2025
- Git-for-data Pioneer lakeFS Secures $20M in Growth Capital, Fills a Critical Gap in Enterprise AI Tech Stack
- Esri, Microsoft, and Space42 Partner to Launch ‘Map Africa Initiative’
- Teradata Expands ModelOps in ClearScape Analytics for Generative and Agentic AI
- Linux Foundation Welcomes AGNTCY to Tackle AI Agent Fragmentation
- Deloitte: Trust Emerges as Main Barrier to Agentic AI Adoption in Finance and Accounting
- Lightbits Launches NVMe over TCP Storage for Kubernetes on Supermicro Systems, Unveiling Benchmark Results
- AWS and dbt Labs Sign Strategic Collaboration Agreement
- Actian Study Finds Organizations Overestimate Data Governance Maturity, Posing Risk to AI Investments
- Privacera Named Leader in GigaOm Radar for Data Access Governance for 4th Consecutive Time
July 28, 2025
- LogicMonitor Achieves FedRAMP Authorization to Operate
- Treasure Data Builds Out AI Agent Foundry with Amazon Bedrock Support
- Elastic Announces 2025 Elastic Excellence Awards Winners
July 25, 2025
- OSU’s TDAI Hosts International Conference on Scalable Scientific Data Management
- NREL and Google Host AI Hackathon to Tackle Data Center Energy Challenges
- Aligned Partners with AMD and USC ISI to Power Next-Gen AI with MEGALODON Language Model
- Qlik Survey: Tariffs Now Rival AI in Reshaping American Graduate Job Market
- Collibra Acquires Deasy Labs to Extend Unified Governance Platform to Unstructured Data
July 24, 2025
- Scaling the Knowledge Graph Behind Wikipedia
- LinkedIn Introduces Northguard, Its Replacement for Kafka
- Top 10 Big Data Technologies to Watch in the Second Half of 2025
- What Are Reasoning Models and Why You Should Care
- Iceberg Ahead! The Backbone of Modern Data Lakes
- Rethinking AI-Ready Data with Semantic Layers
- Apache Sedona: Putting the ‘Where’ In Big Data
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- Rethinking Risk: The Role of Selective Retrieval in Data Lake Strategies
- What Is MosaicML, and Why Is Databricks Buying It For $1.3B?
- More Features…
- Supabase’s $200M Raise Signals Big Ambitions
- Mathematica Helps Crack Zodiac Killer’s Code
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- AI Is Making Us Dumber, MIT Researchers Find
- Promethium Wants to Make Self Service Data Work at AI Scale
- The Top Five Data Labeling Firms According to Everest Group
- With $20M in Seed Funding, Datafy Advances Autonomous Cloud Storage Optimization
- Toloka Expands Data Labeling Service
- AWS Launches S3 Vectors
- Collate Focuses on Metadata Readiness with $10M Series A Funding
- More News In Brief…
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- OpenText Launches Cloud Editions 25.3 with AI, Cloud, and Cybersecurity Enhancements
- TigerGraph Secures Strategic Investment to Advance Enterprise AI and Graph Analytics
- Promethium Introduces 1st Agentic Platform Purpose-Built to Deliver Self-Service Data at AI Scale
- StarTree Adds Real-Time Iceberg Support for AI and Customer Apps
- Databricks Announces Data Intelligence Platform for Communications
- Gathr.ai Unveils Data Warehouse Intelligence
- Campfire Raises $35 Million Series A Led by Accel to Build the Next-Generation AI-Driven ERP
- Graphwise Launches GraphDB 11 to Bridge LLMs and Enterprise Knowledge Graphs
- More This Just In…