

When the CTO for marketing analytics firm Qualia set out to build a new system consumer behavior tracking system, he knew the database selection would be critical. Little did he know that Qualia would end up pushing the limits of the graph database chosen to power the new system.
Qualia is a New York City ad-tech firm that works with advertising agencies and major brands to help identify “intent signals” captured across people’s smart phones, tablets, and PCs. The company collects data on more than 90% of American households, and uses that data to essentially give clients a heads-up when a consumers is likely to be is receptive to a pitch.
Timeliness and accuracy are key to what Qualia does. For example, if a consumer loaded a webpage about cars, that would be considered a “low quality” signal, and that engagement would go into a bucket with 300 million other “auto intenders,” explains Qualia CTO Niels Meersschaert. “It’s not really that meaningful,” he says. “Otherwise we’d sell 300 million cars.”
However, if that search occurs on a smart phone while the consumer is standing in a Ford dealership, that search suddenly has greater meaning. Qualia’s goal is to sift through all that real-time signal information, sort out the low quality signals from the higher quality ones, and decide which consumers are
As Meersschaert explains, it’s critical to differentiate who’s holding which device, where, and when. “By knowing the specific device that that signal occurred on, and knowing that multiple devices are associated with the same consumer or household in a sense, we are able to build a smarter model about what the actual intent is of that consumer at any moment in time,” he says.
The company relies on a steady stream of data collected via cookies and other tracking mechanism used on the Web. The data is largely proprietary, and anonymized to prevent it from being abused. Managing the incoming data stream is one issue that Qualia has to deal with, but doing something useful with the data is much more important.
Qualia’s data naturally forms a tree-like structure, with parent-child relationships. Each household contains one or more people, and each of those people use one or more computer devices, each of which is tracked with its own identification number.
Describing that sort of data structure in a relational database would be possible, Meersschaert says, but it would be far from ideal.
“You’d end up making all these duplicate data entries, and your data would become far larger than it needs to be,” he tells Datanami. “To get access to any piece of data, you’d either have to have a very wide table, which will be very expensive to update and change, or you’re going to have to have a massive number of joins, which is very slow from an operational perspective.”
Meersschaert realized very early on that a graph database would be the right approach. He considered various open source graph databases, including one called Titan. While Titan had its plusses, the CTO found its traits lacking for Qualia’s specific use case.

Neils Meersschaert is CTO at Neo4j user Qualia
“The problem we found with doing something with Titan with an HBase back is the way HBase works,” he says. “There’s a single key that’s on each table, and so every single element that you might query against would be a new table. So you basically had to know all the indexes before you created the tables.”
That would have limited Titan’s usefulness, he says. “It didn’t give you the ability to change [the index] after the fact. It ended up having a tremendous amount of duplication of data when you do a columnar store, because you don’t have the concept of oh here’s a node, and all the things that describe that node are held together.”
Qualia eventually settled on Neo4j, a property graph database developed by Neo Technology. Meersschaert says the way data is stored in nodes and edges in Neo4j is a very good match to the tree-like structure of Qualia’s customer intent data.
Today Qualia’s graph database contains 3 billion nodes and 9 billion edges. This data encompasses 1.2TB of data sitting across three nodes in a cluster of X86 servers. The data is sharded manually, since Neo Technology doesn’t support automatic sharding yet.
“We’re actually one of the largest installations of a graph database globally, which is kind of eye opening to use because we were just making this stuff work,” Meersschaert says.
It’s true that some databases are much larger than 1.2TB. But when you consider how storing the equivalent data in a relational structure would balloon the amount of data by a factor of 10, and slow queries down to a crawl, then you realize how a graph database is such a good fit for this particular use case, he says.
“A graph database allows you to be far more efficient in terms of traversing from one point to another,” Meersschaert says. A relational database could do the work at the end of the day, but “it doesn’t allow you to traverse from one point to another point along a path. This is where a graph-based database makes sense.”
It’s all about using the right tool for the job, Meersschaert says, adding that MongoDB, Hadoop, BigQuery, and Spark all play a role at Qualia. “We use the technology that’s appropriate to the task at hand,” he says.
These other systems work with Neo to surface the right data in the right format, according to a case study posted to the Neo website. Qualia relies on a Ruby-developed app called Spirograph to communicate with Neo4j by either inserting data into or pulling data from the graph in Ruby, which is then processed in Hadoop, Spark and BigQuery. The company also relies on Cerebro to convert user coordinates into a commercial location, which is stored in MongoDB.
Related Items:
JanusGraph Picks Up Where TitanDB Left Off
Neo4j Touts 10x Performance Boost of Graphs on IBM Power FPGAs
Graph Databases Everywhere by 2020, Says Neo4j Chief
April 28, 2025
- Sumo Logic Unifies Security to Deliver Intelligent Security Operations
- Dataminr Unveils Agentic AI Roadmap to Advance Real-Time Decision-Making
- Oracle Expands OCI with 1st Wave of NVIDIA Blackwell GB200 Systems
April 25, 2025
- Denodo Supports Real-Time Data Integration for Hospital Sant Joan de Déu Barcelona
- Redwood Expands Automation Platform with Introduction of Redwood Insights
- Datatonic Announces Acquisition of Syntio to Expand Global Services and Delivery Capabilities
April 24, 2025
- Dataiku Expands Platform with Tools to Build, Govern, and Monitor AI Agents at Scale
- Indicium Launches IndiMesh to Streamline Enterprise AI and Data Systems
- StorONE and Phison Unveil Storage Platform Designed for LLM Training and AI Workflows
- Dataminr Raises $100M to Accelerate Global Push for Real-Time AI Intelligence
- Elastic Announces General Availability of Elastic Cloud Serverless on Google Cloud Marketplace
- CNCF Announces Schedule for OpenTelemetry Community Day
- Thoughtworks Signs Global Strategic Collaboration Agreement with AWS
April 23, 2025
- Metomic Introduces AI Data Protection Solution Amid Rising Concerns Over Sensitive Data Exposure in AI Tools
- Astronomer Unveils Apache Airflow 3 to Power AI and Real-Time Data Workflows
- CNCF Announces OpenObservabilityCon North America
- Domino Wins $16.5M DOD Award to Power Navy AI Infrastructure for Mine Detection
- Endor Labs Raises $93M to Expand AI-Powered AppSec Platform
- Ocient Announces Close of Series B Extension Financing to Accelerate Solutions for Complex Data and AI Workloads
April 22, 2025
- PayPal Feeds the DL Beast with Huge Vault of Fraud Data
- Will Model Context Protocol (MCP) Become the Standard for Agentic AI?
- OpenTelemetry Is Too Complicated, VictoriaMetrics Says
- Thriving in the Second Wave of Big Data Modernization
- Google Cloud Preps for Agentic AI Era with ‘Ironwood’ TPU, New Models and Software
- Google Cloud Fleshes Out its Databases at Next 2025, with an Eye to AI
- Can We Learn to Live with AI Hallucinations?
- Monte Carlo Brings AI Agents Into the Data Observability Fold
- AI Today and Tomorrow Series #3: HPC and AI—When Worlds Converge/Collide
- The Active Data Architecture Era Is Here, Dresner Says
- More Features…
- Google Cloud Cranks Up the Analytics at Next 2025
- New Intel CEO Lip-Bu Tan Promises Return to Engineering Innovation in Major Address
- AI One Emerges from Stealth to “End the Data Lake Era”
- GigaOM Report Highlights Top Performers in Unstructured Data Management for 2025
- SnapLogic Connects the Dots Between Agents, APIs, and Work AI
- Snowflake Bolsters Support for Apache Iceberg Tables
- Supabase’s $200M Raise Signals Big Ambitions
- Big Data Career Notes for March 2025
- GenAI Investments Accelerating, IDC and Gartner Say
- Dremio Speeds AI and BI Workloads with Spring Lakehouse Release
- More News In Brief…
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- MinIO: Introducing Model Context Protocol Server for MinIO AIStor
- Dataiku Achieves AWS Generative AI Competency
- AMD Powers New Google Cloud C4D and H4D VMs with 5th Gen EPYC CPUs
- CData Launches Microsoft Fabric Integration Accelerator
- MLCommons Releases New MLPerf Inference v5.0 Benchmark Results
- Opsera Raises $20M to Expand AI-Driven DevOps Platform
- GitLab Announces the General Availability of GitLab Duo with Amazon Q
- Dataminr Raises $100M to Accelerate Global Push for Real-Time AI Intelligence
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- More This Just In…