

(SuPatMaN/Shutterstock)
Shrinking decision windows and faster data generation set the table for the rise of real-time analytics as a product category. And now with large language models and vector databases paving the path toward enterprise AI, we’ve suddenly entered the era of real-time AI systems, according to Rockset CEO and Co-founder Venkat Venkataramani.
Rockset’s claim to fame up to this point has been developing a relational database that enables users to run SQL queries continuously on large amounts of fresh incoming data. This has a Holy Grail of sorts in advanced analytics, and something that many big data developers–from traditional data warehouse vendors to real-time stream processors—has struggled to do, for one reason or another.
Rockset addresses the real-time analytics need with a slew of capabilities built on the open RocksDB key-value store, which Rockset CTO and Co-founder Dhruba Borthakur helped create at Facebook. This includes Rockset’s powerful converged indexing capabilities, but also includes its schemaless data ingestion, time-series optimization, query planning, and its cloud-based architecture.
The goal up to this point has been to give real-time applications access to the freshest, most up-to-data data arriving over a Kafka pipe. Like other database companies chasing the real-time analytics dream (Imply, Clickhouse, and StarTree), there’s no single brilliant feature that enables you to suddenly successfully run tens of thousands of SQL queries per second on massive amounts of incoming data. Instead, it’s a capability that’s enabled through tireless engineering.
But the goal lines moved in April when Rockset rolled out its initial support for vector search functionality in the database. The new capability allows Rockset to not only store and index vector embeddings in its database, but to combine those vector embedding with metadata filtering, keyword searches, and vector similarity scores.
These new vector-related features will unlock real-time AI use cases for customers, with a particular focus on product recommendations, personalization, and fraud detection, Venkataramani says.
“The old word for this is predictive analytics. I want to predict what is about to happen,” he says. “Nobody says those words anymore. It’s all real-time AI. But essentially the corpus of use cases is very similar to what people would have done.”
Since ChatGPT emerged late last year, companies have started rethinking how and where they can apply AI. New technologies and techaniques based on neural networks and vector embeddings are upending machine learning appraoches that were considered cutting-edge just five years ago, Venkataramani says.
For example, take product recommendation, a time-tested application for data scientists. Instead of a painstaking process that involves identifying the most predictive features and attributes, building a pipeline to automatically extract them, and then carefully constructing a machine learning model to infer consumer preferences at runtime, with the advent of LLMs companies now can just basically just throw all this data into a text document and let the neural nets sort it out, Venkataramani says.
“Previously, the machine learning models will try to extract attributes about your product, color of the product, manufacturer, what category it is in, etc.” he says. “But now, you can just give these AI models and these neural nets just a BLOB of text. You could just give a catalog of images for every product, and you don’t need to tell it ‘Go and tag these images saying this is blue in color, this falls in this category.’
“Now you can feed all the products that the user is looking at, and an AI model can understand the likings and the disliking of the user without having to codify it in terms of particular attributes and particular rules,” Venkataramani continues. “So you can feed and build a vector for the user, and that vector represents all the potential products that they have a higher chance of liking or buying.”
This is dramatically lowering the bar for using AI in production, and enable companies to do much more with it, says Venkataramani, a 2022 Datanami Person to Watch. This could theoretically enable a company to perform predictive analytics on 100,000 items in their catalog, instead of limiting it to their top 1,000 items, he says.
“With AI, it’s almost like some bot is observing all the behavior of the user, and have understood every product at a much deeper level and then building the recommendation in real time when the user is there on the website, not an hour later, not a day later or a week later,” he says. “The level to which you can personalize has gone through the roof because you can now automate all of this.”
Rockset doesn’t create vector embeddings, which are condensed representations of large amounts of unstructured text or image data. But it does allow users to treat vector embeddings as basically another data type in the database, and to perform actions upon them, such as similarity search.
“What models you use to take unstructured data and turn that into a vector, we don’t care,” Venkataramani says. “Think of it as another data type, another column in your table. You need to now to do similarity searches on them. You need to say, given a vector, find me all the other vectors that are closer to this thing that I’m searching for.”
For example, say a customer wanted to identify all images that resemble a daisy in the incoming stream of data (replace “daisy” with “gun” or “knife” if your use case is public safety instead of garden tours).
“The vector that I’m looking for is a daisy, but here are all the other images represented as vector,” Venkataramani explains. “Now you need an index on that. If you do a brute force search on the whole thing, it’ll take 10 days for this question to be answered. I want this to be done in 100 milliseconds. How do you do it? This is where indexing is the name of the game.”
Running machine learning algorithms, such as K-Nearest Neighbor (KNN) or Approximate Nearest Neighbor (ANN), against the index of vector embeddings dramatically speeds up the identification of daisies and daisy-adjacent images in the incoming data.
“No one is exactly looking for this vector in the database. They’re looking for all the ones that are closer, or the closest, and that’s where the indexes are lot more mathematically complex than building indexes on numbers or strings or dates or time,” Venkataramani says. “That’s why vector search is a very different capability and that’s what we’ve added.”
Related Items:
Vector Databases Emerge to Fill Critical Role in AI
Home Depot Finds DIY Success with Vector Search
July 3, 2025
- FutureHouse Launches AI Platform to Accelerate Scientific Discovery
- KIOXIA AiSAQ Software Advances AI RAG with New Version of Vector Search Library
- NIH Highlights AI and Advanced Computing in New Data Science Strategic Plan
- UChicago Data Science Alum Transforms Baseball Passion into Career with Seattle Mariners
July 2, 2025
- Bright Data Launches AI Suite to Power Real-Time Web Access for Autonomous Agents
- Gartner Finds 45% of Organizations with High AI Maturity Sustain AI Projects for at Least 3 Years
- UF Highlights Role of Academic Data in Overcoming AI’s Looming Data Shortage
July 1, 2025
- Nexdata Presents Real-World Scalable AI Training Data Solutions at CVPR 2025
- IBM and DBmaestro Expand Partnership to Deliver Enterprise-Grade Database DevOps and Observability
- John Snow Labs Debuts Martlet.ai to Advance Compliance and Efficiency in HCC Coding
- HighByte Releases Industrial MCP Server for Agentic AI
- Qlik Releases Trust Score for AI in Qlik Talend Cloud
- Dresner Advisory Publishes 2025 Wisdom of Crowds Enterprise Performance Management Market Study
- Precisely Accelerates Location-Aware AI with Model Context Protocol
- MongoDB Announces Commitment to Achieve FedRAMP High and Impact Level 5 Authorizations
June 30, 2025
- Campfire Raises $35 Million Series A Led by Accel to Build the Next-Generation AI-Driven ERP
- Intel Xeon 6 Slashes Power Consumption for Nokia Core Network Customers
- Equal Opportunity Ventures Leads Investment in Manta AI to Redefine the Future of Data Science
- Tracer Protect for ChatGPT to Combat Rising Enterprise Brand Threats from AI Chatbots
June 27, 2025
- Inside the Chargeback System That Made Harvard’s Storage Sustainable
- What Are Reasoning Models and Why You Should Care
- Databricks Takes Top Spot in Gartner DSML Platform Report
- Why Snowflake Bought Crunchy Data
- LinkedIn Introduces Northguard, Its Replacement for Kafka
- Change to Apache Iceberg Could Streamline Queries, Open Data
- Snowflake Widens Analytics and AI Reach at Summit 25
- Fine-Tuning LLM Performance: How Knowledge Graphs Can Help Avoid Missteps
- Agentic AI Orchestration Layer Should be Independent, Dataiku CEO Says
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- ‘The Relational Model Always Wins,’ RelationalAI CEO Says
- Confluent Says ‘Au Revoir’ to Zookeeper with Launch of Confluent Platform 8.0
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- AI Agents To Drive Scientific Discovery Within a Year, Altman Predicts
- DuckLake Makes a Splash in the Lakehouse Stack – But Can It Break Through?
- The Top Five Data Labeling Firms According to Everest Group
- Supabase’s $200M Raise Signals Big Ambitions
- Toloka Expands Data Labeling Service
- With $17M in Funding, DataBahn Pushes AI Agents to Reinvent the Enterprise Data Pipeline
- More News In Brief…
- Astronomer Unveils New Capabilities in Astro to Streamline Enterprise Data Orchestration
- Databricks Unveils Databricks One: A New Way to Bring AI to Every Corner of the Business
- BigID Reports Majority of Enterprises Lack AI Risk Visibility in 2025
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- Astronomer Introduces Astro Observe to Provide Unified Full-Stack Data Orchestration and Observability
- Snowflake Openflow Unlocks Full Data Interoperability, Accelerating Data Movement for AI Innovation
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- BigBear.ai And Palantir Announce Strategic Partnership
- Databricks Donates Declarative Pipelines to Apache Spark Open Source Project
- Code.org, in Partnership with Amazon, Launches New AI Curriculum for Grades 8-12
- More This Just In…