People to Watch 2024 – Luis Ceze
You changed the name of your company from OctoML to OctoAI in January. Can you elaborate on the change?
We changed our name from OctoML to OctoAI to better reflect the expansion and evolution of our product suite, which more broadly addresses the growing market needs in the generative AI space.
In the last year, we significantly expanded our platform for developers to build production applications with generative AI models. This means companies can run any model of their choice— whether off-the-shelf, custom or open-source— and deploy them on-prem within their own environments or in the cloud.
Our latest offering is OctoStack, a turn-key production platform that delivers highly-optimized inference, model customization and asset management at scale for large enterprises. This gives companies total AI autonomy when building and running Generative AI applications directly within their own environments.
We already have dozens of high-growth generative AI customers—like Apate.ai, Otherside AI, Latitude Games, and Capitol AI using the platform to seamlessly transport this highly reliable, customizable, efficient infrastructure directly into their own environment. These companies are now firmly in control of how and where they work with models and benefit from our maintenance-free serving stack.
You’re a co-founder of the Apache TVM project, which allows machine learning models to be optimized and compiled to different hardware. But GPUs are all the rage. Should we be more open to running ML models on other hardware?
We’ve experienced more AI innovation the last 18 months than ever before. From one day to the next, AI has shifted from the lab to a viable business driver. It is clear that for AI to scale, we need to be able to run it on a broad range of hardware from data-centers to edge and mobile devices.
But we’re at a juncture that is reminiscent of the cloud days. Back then companies wanted the freedom to host data across more than one cloud, or a combination of cloud and on-premise.
Today companies also want accessibility and choice when building with AI. They want the choice to run any model, be it custom, proprietary or open source. They want the freedom to run said models on any cloud or local endpoint, without handcuffs.
This was our mission with Apache TVM early on, and this has carried on through my work at OctoAI. OctoAI SaaS and OctoStack are designed with the principle of hardware independence and portability to different customer environments.
GenAI is going from a period of experimentation in 2023 to deployment in 2024. What are the keys to making LLMs more impactful for businesses?
We strongly believe that 2024 is the year that generative AI makes it out of development and into production. But to bring this to fruition, companies are going to have to focus on a few key things.
The first is controlling cost so the unit economics of LLMs work in your favor. Model training is a predictable expense, but inference (calling a model running in production) can get very expensive, especially if usage surges beyond what you’ve planned for.
Second is selecting the right model for your use case. It’s getting more challenging because of the sheer number of LLMs to pick from (there are 80,000 and counting) and model fatigue is beginning to set in. Finding one that is powerful enough to deliver the quality you need and runs efficiently as to be cost-effective – that’s the balance you want to strike.
Third, techniques like fine-tuning are incredibly important to help customize those LLMs for unique functionality. One trend we observe is that LLMs themselves are increasingly commodified, and the real value comes from customization to meet a specific, high-value use case.
Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
Food for me is more than nutrition :). I love to learn about food; I love to cook it; I love to eat it.
I like to understand food “cross-stack”, from cultural aspects down to chemistry. And then eating / drinking ;).
Another fun bit: some of my research was in DNA data storage, and my work recently traveled to the moon!