People to Watch 2016

Jacques Nadeau
Co-founder and CTO
Dremio

Jacques Nadeau Image

Jacques Nadeau is the Co-founder and CTO at Dremio. He also holds the position of Vice President of Apache Arrow, a new in-memory data format that promises to standardize how columnar data is stored and analyzed in Hadoop. Prior to Dremio, Jacques led the Apache Drill development efforts at MapR. He is also known for building the Avenue A | Razorfish analytics data warehousing system and associated services practice, which was acquired by Microsoft.

Datanami: Hi Jacques. Congratulations on being selected as a Datanami 2016 Person to Watch. What are Dremio’s leading goals for 2016 with regard to big data?

Jacques Nadeau: The data landscape has evolved in recent years. Data is increasingly being stored in non-relational datastores, such as NoSQL databases, cloud storage and Hadoop. In many ways, it has become easier for developers to build applications, but harder for business analysts and data scientists to discover, explore and analyze the data. We’re developing innovative technology to solve this problem.

Datanami: You recently spearheaded the creation of Apache Arrow. How do you see that impacting the usefulness of big data?

Apache Arrow enables columnar in-memory execution, which increases the speed of data processing by 10-100x and enables systems to exchange data in memory with zero overhead. In many ways, Arrow is the next step towards heterogeneous data environments. Just like Kubernetes, Mesos and YARN enabled multiple systems to share cluster compute resources, Arrow enables these systems to share data and memory.

My recent blog post about Arrow explains the significance of the project. With key developers from over a dozen major open source projects on board, including Drill, Kudu, Parquet, Python (Pandas) and Spark, Apache Arrow is well positioned to redefine the Big Data landscape.

Datanami: Generally speaking, on the subject of big data, what do you see as the most important trends for 2016 that will have an impact now and into the future?

I think that there are a couple trends that will impact how organizations deal with data:

Data heterogeneity. Organizations will continue to adopt a wide variety of modern datastores (eg, Elasticsearch, MongoDB, Cassandra, S3) and online services. As a result, traditional ETL will no longer be an option, and organizations will need to embrace new ways of processing and analyzing data across disparate data sources.
IoT. As more and more devices become connected to the Internet, new use cases are emerging, leveraging the data exhaust from these systems. The unique characteristics of IoT data, and the use cases enabled by this data, will lead to new innovations in data storage and processing.

Datanami: Outside of the professional sphere, what can you tell us about yourself – personal life, family, background, hobbies, etc.?

I live with my beautiful and supportive wife in Santa Clara. We have two crazy pug mix rescues that keep us on our toes. I love to build things at home as well. This includes tinkering in my garage with woodworking, welding, electronics as well gardening in the yard. When the weather turns nice, we enjoy water-skiing on Lake Anderson and running trails with Brazen around the bay. And of course, we try to get to the beach as often as possible.

Datanami: Final question: Who is your idol in the big data industry and why?

Google’s Jeff Dean has had a tremendous impact on the Big Data industry. He was among the inventors of the Google File System, MapReduce, LevelDB, Google Brain and TensorFlow. Some of these innovations (eg, LevelDB, TensorFlow) have been released as open source projects, and others (eg, GFS, MapReduce) have inspired the creation of open source projects.

I’m also inspired by Julien Le Dem and what he’s accomplished with the Apache Parquet project. Julien was the tech lead for Twitter’s data analytics pipeline, and created Parquet while riding the “Twitter shuttle” home. He’s done a phenomenal job cultivating the community, and Parquet has become the de-facto standard approach for storing Big Data. Julien recently joined Dremio as an architect, and has been tremendously helpful in building out the Arrow community. I’m very excited about working with him at Dremio.


P. Taylor Goetz Apache Software	Jeff Hammerbacher Cloudera	Jay Kreps Confluent

Todd Lipcon Cloudera	Jacques Nadeau Dremio	Peter Norvig Google

Alex Pentland MIT	Jennifer Priestley Kennesaw State University	Nate Silver FiveThirtyEight

Daniel Sturman Cloudera	Werner Vogels Amazon	Matei Zaharia Databricks