
It’s Snowflake Vs. Databricks in Dueling Big Data Conferences

(Shutterstock/Peeradontax and Viktoriia Debopre)
Snowflake and Databricks have emerged as the two data warehousing behemoths of the cloud era, slugging away at each other to attract customers and their big data workloads. Perhaps it’s not a coincidence, then, that the two vendors will holding their respective annual conferences in the same San Francisco location, separated by a week, in early June.
Snowflake kicks things off first with its Snowflake Summit 25, which is scheduled to take place at the Moscone Center June 2-5. Called the Data Cloud Summit in previous years, the Snowflake Summit this year will try to exceed the attendance of last year’s show, which brought between 15,000 and 16,000 attendees to the downtown San Francisco location.
The focus for Summit 2025 will be building and running big data, advanced analytics, and AI systems on Snowflake’s data platform. Attendees who pay the general admission fee of $2,295 will have their pick of more than 500 sessions over 15 tracks, with more than 190 vendors in the expo. The opening keynote, which will be livestreamed starting at 5 p.m. PT Monday, June 2, will feature Snowflake CEO Sridhar Ramaswamy along with Lynn Martin, President of the New York Stock Exchange Group, and Sam Altman, the CEO and co-founder of OpenAI.
Databricks follows up with its own conference, the Data + AI Summit, which takes place at Moscone Center June 9-12. The company touted 16,000 attendees at its 2024 event, plus another 40,000 attending virtually, and this year, the company is estimating that 20,000 people will attend in person.
Data + AI Summit attendees who buy a full ticket for $1,895 will have their pick of more than 700 sessions, which will focus on using the Databricks platform to build–you guessed it–big data, advanced analytics, and AI applications. JPMorgan Chase CEO and Chairman Jamie Dimon and Dario Amodei, the co-founder and CEO of Anthropic, will join Databricks Co-Founder and CEO Ali Ghodsi on the stage during the Wednesday morning keynote, which will be livestreamed starting at 8 a.m. PT.
There was a ton of news coming out of last year’s Snowflake and Databricks conferences, much of it centering around the emergence of Apache Iceberg as the defacto industry standard for big data table formats.
Snowflake, which had previously supported Iceberg, rolled out its open source Polaris metadata catalog, which served as the REST-based glue that enabled Snowflake customers to connect Iceberg tables to a variety of compute engines, including Snowflake’s proprietary engine but also Trino, Dremio, Apache Spark, Apache Flink, and Apache Doris, among others. Polaris is now incubating at the Apache Software Foundation, and appears on course to becoming another standard for the class of technology known as metadata catalogs (not to be confused with standard data catalogs).
Not to be outdone, Databricks announced the acquisition of Tabular, the commercial Iceberg outfit founded by the creators of the Apache Iceberg project, during Snowflake’s 2024 conference. In welcoming the Iceberg team, Databricks unveiled a roadmap that called for Apache Iceberg and its own Delta Table to eventually merge, thereby eliminating the risk that customers would choose the “wrong” table format.
The two events shook the big data community in ways that haven’t been seen since the collapse of Hadoop half-a-decade earlier. Large organizations that had held off building next-generation big data lakes based on one of the open table formats suddenly had a green light from the biggest vendors that Iceberg was the winner. It was a giant win for open platforms and for organizations who feared that data platforms would hold their data hostage.
What will the dueling Snowflake and Databricks events bring this year? It’s unlikely we’ll experience such a monumental as we did last year, with the emergence of Iceberg and Polaris as industry standards. The community is still sorting out the importance of table formats and metadata catalogs, so you can expect to see quite a bit of content around those topics.
The big news this year is likely to revolve around what everybody seems to be talking about: Agentic AI. Technology vendors are rapidly rolling out ways to build and coordinate large numbers of AI agents to handle all sorts of tasks, ranging from customer service to writing and deploying code. It’s a good bet that agentic AI will be the number one topic at both Snowflake and Databricks events this year, although there always are surprises.
What are your thoughts on the two conferences? On the Databricks versus Snowflake battle? As always, you can let us know what you’re thinking by dropping us a line at [email protected].
Related Items:
It’s Go Time for Open Data Lakehouses
Snowflake Embraces Open Data with Polaris Catalog
Databricks Nabs Iceberg-Maker Tabular to Spawn Table Uniformity