
MIT Report Flags 95% GenAI Failure Rate, But Critics Say It Oversimplifies

(Yuriy2012/Shutterstock)
MIT’s State of AI in Business 2025 has gone viral, and it’s not hard to see why. The report opens with a bold headline that more than $30 billion has been spent on GenAI, yet 95% of enterprise pilots still fail to make it to production.
What’s holding companies back isn’t the technology itself or the regulations around it. It’s the way the tools are being used. Most systems don’t fit into real workflows. They can’t remember, they don’t adapt, and they rarely improve with use. The result is a wave of pilots that look promising in the lab but fall apart in practice. According to the report, that’s the biggest reason most deployments never make it past the testing phase.
Some critics have dismissed the report as overhyped or methodologically weak, but even they admit it captures something many enterprise teams are quietly feeling that the real returns just haven’t shown up, at least not as expected.
The team behind MIT’s State of AI in Business 2025 calls this split as the GenAI Divide. On one side are the rare few pilots, around 5%, who actually turn into big wins, pulling in millions of dollars. On the other side are almost everyone else, the 95% of projects that stall out and never move beyond the testing phase.
What makes this gap so interesting is that it isn’t about having the best model, the fastest chips, or dodging regulations. MIT’s researchers say it comes down to how the tools are applied. The success stories are the ones that build or buy systems designed to slot neatly into real workflows and improve with time. The failures are the ones that try to slot generic AI into clunky processes and expect transformation to follow.
The scale of adoption makes the divide even more striking. ChatGPT, Copilot, and other general-purpose tools are everywhere. More than 80% of companies have at least experimented with them, and nearly 40% say they’ve rolled them out in some way. Yet what these tools really deliver is a bump in personal productivity; they don’t move the P&L needle.
MIT found that enterprise tools struggle even more. About 60% of companies looked at custom platforms or vendor systems, but only 20% made it to a pilot. Most failed because the workflows were brittle, the tools did not learn, and they did not fit the way people actually work.
That explanation from MIT raises a question. Is the problem the tools themselves, or the way enterprises try to use them? The report insists it is about fit rather than technology, yet in the same breath it points to tools that fail to learn or adapt. That ambiguity is never fully resolved, and it is one reason some critics say the study overstates its case.
MIT frames the divide through four patterns. The first is limited disruption. Out of nine industries studied, only two, technology and media, show signs of real change, while the rest continue to run pilots without much evidence of new business models or shifts in customer behavior. The second is the enterprise paradox. Large companies launch the most pilots but are the slowest to scale, with mid-market firms often moving from test to rollout in about 90 days, while enterprises can take closer to nine months.
The third pattern is investment bias. MIT notes that around 70% of budgets go to sales and marketing because results are easier to measure, even though stronger returns often appear in back-office automation, where outsourcing and agency costs can be cut. The fourth is the implementation advantage. External partnerships reach deployment about 67% of the time compared with 33% for internal builds. MIT presents this as evidence that approach, rather than raw resources, separates the few winners from the rest.
One criticism of the MIT report is the way it leans on its headline number. The claim that 95% of enterprise AI projects fail does appear in the report, but it is offered without much explanation of how it was calculated or what data underpins it. For a figure that bold, the lack of transparency leaves room for doubt.
There are also concerns about how success and failure are defined. Pilots that did not deliver sustained profit gains are treated as failures, even if they created some benefit along the way. That framing can make modest returns look like zero progress.
Some also question the project’s neutrality, given its ties to commercial players developing new AI agent protocols. The report’s recommendations point directly in that direction. It says companies that succeed are the ones that buy instead of build, give AI tools to business teams rather than central labs, and choose systems that fit into daily workflows and improve over time.
According to the report, the next phase is going to be about agentic AI, where tools are able to learn, remember, and coordinate across vendors. The authors describe an emerging Agentic Web where these systems handle real business processes in ways that static pilots have not. They suggest this network of agents could finally bring the scale and consistency that most early GenAI deployments have struggled to achieve.
Related Items
Gartner Warns 30% of GenAI Initiatives Will Be Abandoned by 2025
These Are the Top Challenges to GenAI Adoption According to AWS
Early GenAI Adopters Seeing Big Returns for Analytics, Study Says