Follow BigDATAwire:

September 15, 2025

In Order to Scale AI with Confidence, Enterprise CTOs Must Unlock the Value of Unstructured Data

Felix Van de Maele

(tookitook/Shutterstock)

Raise your hand if you’ve heard of unstructured data. Now raise your hand if you truly understand its value and power. If I was a betting person, I’d say that there were fewer hands raised for the second statement than the first. And what’s particularly interesting about this sobering fact is that unstructured data is not new and yet it’s become a hot topic for tech leaders and CTOs throughout 2025.

Let’s look at how we got here and how enterprise CTOs can scale AI with confidence once they establish a robust foundation for governing unstructured data across their organization.

A Look Back at the Value of Unstructured Data: 2019 vs 2023 vs 2025

In 2019, Deloitte released an in-depth report and survey that revealed only 18% of organizations reported being able to take advantage of unstructured data. When you consider the fact that 80-90% of data is unstructured (i.e. text, video, audio and social media), this highlights that there was–and to some extent still is–an untapped resource that enterprises were and are unsure how to take advantage of.

The Deloitte report also revealed some other interesting findings: 64% of organizations reported relying on structured data from internal resources/systems. On the other hand, according to the same report, executives who said unstructured data is one of the most valuable sources of insights are 24% more likely to have exceeded their business goals. Enterprises that can identify and activate their unstructured data will outpace those who can’t as AI becomes core to business strategy.

(Tee11/Shutterstock)

However, before you can have successful initiatives and exceed business goals, you have to address where the challenges are within your enterprise. According to a 2023 IDC report, more than half of enterprise leaders say unstructured data mostly stays in a silo, and less than half of information actually gets shared between employees or systems. What’s more, for two in five enterprise leaders, the majority of the data their company stores is used only once, then left unaccessed.

Over the past two years, we’ve witnessed rapid advancements in Large Language Models (LLMs). As these models become increasingly powerful–and more commoditized–the true competitive edge for enterprises will lie in how effectively they harness their internal data. Unstructured content forms the foundation of modern AI systems, making it essential for organizations to build strong unstructured data infrastructure to succeed in the AI-driven era.

This is what we mean by an unstructured data foundation: the ability for companies to rapidly identify what unstructured data exists across the organization, assess its quality, sensitivity, and safety, enrich and contextualize it to improve AI performance, and ultimately create a governed system for generating and maintaining high-quality data products at scale.

In 2025, unstructured data is as much about quality as it is about quantity. “Quality” in the context of unstructured data remains largely uncharted territory. Companies need clear frameworks to assess dimensions like relevance, freshness, and duplication. Over the past six years, the volume and variety of unstructured data–and the number of AI applications that generate or depend on it–have exploded. Many have called it the largest and most valuable source of data within an organization, and I’d agree–especially as AI becomes increasingly central to how enterprises operate. Here’s why.

High Quality Unstructured Data for AI: What Enterprises Can’t Afford to Get Wrong in 2025 and Beyond 

When poor-quality data makes its way into AI models, it leads to a new set of issues: duplicatesinaccuraciesoutdated information, and hallucinations that undermine reliability, trust and overall confidence.

There are different approaches to solving this–one being to prevent these problems before they happen. However, here is where enterprises should focus their efforts in today’s digital-first world.

  1. Start with quality: If your content is inconsistent, out of date, or full of noise, your AI will be too. That means unreliable insights, poor decisions, and customer experiences that fall flat. Clean, high-quality content is non-negotiable.

    (Maksim-Kabakou/Shutterstock)

  2. Give it context: Unstructured data is only valuable when it’s connected to your business. A contract means something different to Legal than to Procurement. Same goes for support tickets or customer reviews. AI can’t deliver without understanding the who, what, and why behind the content.
  3. Automate what matters – free up your experts: Unstructured data is only valuable when it’s correctly contextualized—often through the addition of business metadata. Yet today, many companies rely heavily on domain experts to manually label documents and define taxonomies, which is slow, costly, and fundamentally unscalable. To unlock the full value of unstructured content for AI and search, enterprises need to lean into GenAI-native automation—accelerating metadata enrichment while keeping expert input focused where it matters most.
  4. Govern it now – not later: If you’re not governing your unstructured content, you’re leaving the door open to AI hallucinations, compliance gaps, and security risks. The smartest companies are already extending their data governance programs to cover files, documents, recordings, and more.

Bottom line: unstructured data holds massive potential, but only if you’re ready to govern it. In today’s AI era, ignoring it isn’t just a missed opportunity–it’s a competitive risk.

About the author: Felix Van de Maele is the co-founder and CEO of Collibra, a data intelligence company. Prior to co-founding Collibra in 2008, Van de Maele served as a researcher at the Semantics Technology and Applications Research Laboratory (STARLab) at the Vrije Universiteit Brussel, where he focused on ontology-focused crawlers for the semantic Web and semantic data integration. 

Related Items:

Tapping into the Unstructured Data Goldmine for Enterprise in 2025

Peering Into the Unstructured Data Abyss

Getting the Upper Hand on the Unstructured Data Problem

BigDATAwire