(SuPatMaN/Shutterstock)
Shrinking decision windows and faster data generation set the table for the rise of real-time analytics as a product category. And now with large language models and vector databases paving the path toward enterprise AI, we’ve suddenly entered the era of real-time AI systems, according to Rockset CEO and Co-founder Venkat Venkataramani.
Rockset’s claim to fame up to this point has been developing a relational database that enables users to run SQL queries continuously on large amounts of fresh incoming data. This has a Holy Grail of sorts in advanced analytics, and something that many big data developers–from traditional data warehouse vendors to real-time stream processors—has struggled to do, for one reason or another.
Rockset addresses the real-time analytics need with a slew of capabilities built on the open RocksDB key-value store, which Rockset CTO and Co-founder Dhruba Borthakur helped create at Facebook. This includes Rockset’s powerful converged indexing capabilities, but also includes its schemaless data ingestion, time-series optimization, query planning, and its cloud-based architecture.
The goal up to this point has been to give real-time applications access to the freshest, most up-to-data data arriving over a Kafka pipe. Like other database companies chasing the real-time analytics dream (Imply, Clickhouse, and StarTree), there’s no single brilliant feature that enables you to suddenly successfully run tens of thousands of SQL queries per second on massive amounts of incoming data. Instead, it’s a capability that’s enabled through tireless engineering.
But the goal lines moved in April when Rockset rolled out its initial support for vector search functionality in the database. The new capability allows Rockset to not only store and index vector embeddings in its database, but to combine those vector embedding with metadata filtering, keyword searches, and vector similarity scores.
These new vector-related features will unlock real-time AI use cases for customers, with a particular focus on product recommendations, personalization, and fraud detection, Venkataramani says.
“The old word for this is predictive analytics. I want to predict what is about to happen,” he says. “Nobody says those words anymore. It’s all real-time AI. But essentially the corpus of use cases is very similar to what people would have done.”
Since ChatGPT emerged late last year, companies have started rethinking how and where they can apply AI. New technologies and techaniques based on neural networks and vector embeddings are upending machine learning appraoches that were considered cutting-edge just five years ago, Venkataramani says.
For example, take product recommendation, a time-tested application for data scientists. Instead of a painstaking process that involves identifying the most predictive features and attributes, building a pipeline to automatically extract them, and then carefully constructing a machine learning model to infer consumer preferences at runtime, with the advent of LLMs companies now can just basically just throw all this data into a text document and let the neural nets sort it out, Venkataramani says.
“Previously, the machine learning models will try to extract attributes about your product, color of the product, manufacturer, what category it is in, etc.” he says. “But now, you can just give these AI models and these neural nets just a BLOB of text. You could just give a catalog of images for every product, and you don’t need to tell it ‘Go and tag these images saying this is blue in color, this falls in this category.’
“Now you can feed all the products that the user is looking at, and an AI model can understand the likings and the disliking of the user without having to codify it in terms of particular attributes and particular rules,” Venkataramani continues. “So you can feed and build a vector for the user, and that vector represents all the potential products that they have a higher chance of liking or buying.”
This is dramatically lowering the bar for using AI in production, and enable companies to do much more with it, says Venkataramani, a 2022 Datanami Person to Watch. This could theoretically enable a company to perform predictive analytics on 100,000 items in their catalog, instead of limiting it to their top 1,000 items, he says.
“With AI, it’s almost like some bot is observing all the behavior of the user, and have understood every product at a much deeper level and then building the recommendation in real time when the user is there on the website, not an hour later, not a day later or a week later,” he says. “The level to which you can personalize has gone through the roof because you can now automate all of this.”
Rockset doesn’t create vector embeddings, which are condensed representations of large amounts of unstructured text or image data. But it does allow users to treat vector embeddings as basically another data type in the database, and to perform actions upon them, such as similarity search.
“What models you use to take unstructured data and turn that into a vector, we don’t care,” Venkataramani says. “Think of it as another data type, another column in your table. You need to now to do similarity searches on them. You need to say, given a vector, find me all the other vectors that are closer to this thing that I’m searching for.”
For example, say a customer wanted to identify all images that resemble a daisy in the incoming stream of data (replace “daisy” with “gun” or “knife” if your use case is public safety instead of garden tours).
“The vector that I’m looking for is a daisy, but here are all the other images represented as vector,” Venkataramani explains. “Now you need an index on that. If you do a brute force search on the whole thing, it’ll take 10 days for this question to be answered. I want this to be done in 100 milliseconds. How do you do it? This is where indexing is the name of the game.”
Running machine learning algorithms, such as K-Nearest Neighbor (KNN) or Approximate Nearest Neighbor (ANN), against the index of vector embeddings dramatically speeds up the identification of daisies and daisy-adjacent images in the incoming data.
“No one is exactly looking for this vector in the database. They’re looking for all the ones that are closer, or the closest, and that’s where the indexes are lot more mathematically complex than building indexes on numbers or strings or dates or time,” Venkataramani says. “That’s why vector search is a very different capability and that’s what we’ve added.”
Related Items:
Vector Databases Emerge to Fill Critical Role in AI
Home Depot Finds DIY Success with Vector Search
October 23, 2025
- EDB Launches ‘AI & Data Horizons’ Podcast Exploring the Future of Sovereign AI and Data
- Opsera Unveils New DevOps Platform with Hummingbird AI Reasoning Agents, ‘Insights in a Box,’ and GitHub MCP Integration
- Sisense Finds ‘Churn Paradox’ as 82% of Developers Switch Despite High Satisfaction
- Domino Empowers Enterprise IT Teams to Deliver AI ROI at Scale by Maximizing Impact and Reducing Cost
- Couchbase Demonstrates Sub-Second Latency and Higher Accuracy in Billion-Vector Benchmark
- DCAI Expands AI Infrastructure Offering with WEKA’s Integrated Storage Services
- Red Hat Launches Red Hat Developer Lightspeed for AI-Powered Developer Productivity
- TetraScience Launches Scientific AI Lighthouse Program with Takeda as Founding Partner
October 22, 2025
- Qlik Opens Registration for Qlik Connect 2026, Set for April in Florida
- MariaDB and Exasol Partner to Bring Unified, High-Performance Analytics at Unprecedented Cost Efficiency
- Opsera Shares AI and DevOps Innovations at Dreamforce 2025, Announces New Partnership with Salesforce
- Couchbase 8.0 Delivers Unified Data Platform for High-Performance AI Applications at Scale
October 21, 2025
- Axelera AI Introduces Europa Processor for Scalable Edge-to-Enterprise AI Inference
- 10x Genomics and Anthropic Partner to Make Single Cell and Spatial Analysis More Accessible Through Claude for Life Sciences
- Dell Integrates NVIDIA, Elastic and Starburst to Advance AI Data Platform Capabilities
October 20, 2025
- IBM and Groq Partner to Accelerate Enterprise AI Deployment with Speed and Scale
- Lenovo Advances the AI-Enabled Workforce with Agentic AI: Trusted, Proven, and Ready to Deliver ROI
- Benchling Partners with Anthropic to Build a Bridge Between Science and AI
- Rokt mParticle Launches Hybrid Customer Data Platform on Snowflake AI Data Cloud for Enterprise Companies
- Health Data Analytics Institute Deploys Innovative Use of LLMs
- Rethinking Risk: The Role of Selective Retrieval in Data Lake Strategies
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- What Is MosaicML, and Why Is Databricks Buying It For $1.3B?
- Goldman Sachs Chief Data Officer Warns AI Has Already Run Out of Data
- What Are Reasoning Models and Why You Should Care
- 5 Tips to Architecting an Apache Iceberg Lakehouse
- Meet Krishna Subramanian, a 2025 BigDATAwire Person to Watch
- Building Intelligence into the Database Layer
- Meet Vinoth Chandar, a 2024 Person to Watch
- What the Fivetran-dbt Merger Means for the Data Ecosystem
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- AI Agents Debut Atop Gartner 2025 Hype Cycle for Emerging Tech
- Bloomberg Finds AI Data Centers Fueling America’s Energy Bill Crisis
- New GenAI System Built to Accelerate HPC Operations Data Analytics
- Why MinIO Added Support for Iceberg Tables
- Data is at the Center of Scientific Discovery Inside MIT’s New AI-Powered Platform
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- Teradata Puts Data at the Core of Agentic AI with Launch of AgentBuilder
- Voltron Positions Data Flow as the Next Frontier in AI Performance
- More News In Brief…
- Anthropic and Salesforce Expand Strategic Partnership to Deliver Trusted AI for Regulated Industries
- Deloitte Survey Finds AI Use and Tech Investments Top Priorities for Private Companies in 2024
- Google Cloud’s 2025 DORA Report Finds 90% of Developers Now Use AI in Daily Workflows
- NVIDIA: Accelerating Large-Scale Data Analytics with GPU-Native Velox and cuDF
- Snowflake, Salesforce, dbt Labs, and More Launch Open Semantic Interchange Initiative to Standardize Data Semantics
- NVIDIA and Partners Launch NIM Agent Blueprints for Enterprises to Make Their Own AI
- Snowflake and Palantir Announce Strategic Partnership for Enterprise-Ready AI & Analytics
- Zilliz Sets New Industry Standard with VDBBench 1.0 for Benchmarking Real Vector Database Production Workloads
- John Snow Labs Cuts Cancer Registry Abstraction Time from Hours to Minutes
- Dataiku Breaks $350M ARR Barrier as Enterprises Accelerate the Move to Trusted AI at Scale
- More This Just In…













