March 14, 2025

Patronus AI Launches Industry-First Multimodal LLM-as-a-Judge for Image Evaluation

SAN FRANCISCO, Calif., March 14, 2025 — Patronus AI today announced the launch of the industry’s first Multimodal LLM-as-a-Judge (MLLM-as-a-Judge), a groundbreaking evaluation capability that enables developers to score and optimize multimodal AI systems for image-to-text applications.

The new Judge-Image tool, powered by Google Gemini, allows AI engineers to iteratively measure and improve the quality of their multimodal AI applications by scanning for text presence, grid structure, spatial orientation, and object identification.

“Our mission has always been to advance scalable oversight of AI,” said Anand Kannappan, CEO and Co-founder of Patronus AI. “With the release of GPT-4o, Claude Opus, and Google’s Gemini over the last year, organizations have invested heavily in image generation to drive customer value. However, as these AI experiences scale, so does the unpredictability of LLM systems. Our MLLM-as-a-Judge addresses this critical challenge by providing transparent, reliable evaluation of multimodal systems.”

The Judge-Image tool offers several out-of-box evaluation criteria, including:

Caption hallucination detection (standard and strict)
Primary and non-primary object description verification
Object location accuracy

Beyond validating image caption correctness, Judge-Image can test OCR extraction accuracy for tabular data, AI-generated brand asset accuracy, and scene description validity.

Prior research suggests that Google Gemini can serve as a more reliable MLLM judge compared to alternatives like OpenAI’s GPT-4V, exhibiting less egocentricity and a more equitable approach to judgment. Patronus AI’s internal evaluation datasets confirmed that the Gemini backbone performed better compared to other multimodal LLMs.

Patronus AI plans to expand their multimodal evaluation capabilities to include audio and vision features in future releases.

Customer Use Case

Etsy, the leading technology marketplace for independent sellers, has already implemented Patronus AI’s MLLM-as-a-Judge to detect and mitigate caption hallucination from their product images. The Etsy AI team leverages this and the broader Patronus platform to optimize their multimodal AI system.

For more information, visit the Patronus AI documentation at https://docs.patronus.ai/docs/multimodal_evals/base.

Source: Patronus AI

Tags: MLLM-as-a-Judge, multimodal LLM

Patronus AI Launches Industry-First Multimodal LLM-as-a-Judge for Image Evaluation

September 10, 2025

September 9, 2025

Sponsored Partner Content

MHC Strengthens AI Efforts Through Progress Hybrid Data Pipeline

Build Trustworthy AI Workflows with Cube D3

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Unlock 5 Key Insights for Building High-Performance AI Infrastructure – From Power to Production

6 Data Modernization Challenges of Hybrid Data

Sponsored Multimedia

Understanding the State of Enterprise Open-Source AI
No Comments

Unlocking Unstructured Data with GenAI
No Comments

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Patronus AI Launches Industry-First Multimodal LLM-as-a-Judge for Image Evaluation

September 10, 2025

September 9, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link