Follow BigDATAwire:

May 8, 2025

Open Source AI Gets a Boost as IBM Hands Over Key Projects to Linux Foundation

Shutterstock

The open AI movement gains momentum with IBM’s latest contribution. The Linux Foundation has welcomed three new AI projects contributed by Big Blue into its LF AI & Data Foundation: Docling, BeeAI, and the Data Prep Kit. These projects reflect a growing shift toward open and collaborative AI development.

Unlike proprietary solutions, which often limit customization and integration, these open-source tools provide developers with the flexibility to modify, improve, and adapt them to diverse needs. These tools help address key challenges in AI development, such as multi-agent data orchestration and scalable data preparation. 

IBM has developed the AI tools and contributed them to the Linux Foundation, which will oversee their growth within an open-source community. The LF AI & Data Foundation is expected to serve as a neutral host for these projects. IBM’s contributions not only strengthen the open source AI ecosystem but also highlight the importance of interoperability and accessibility in modern AI development. 

“We are excited to welcome Docling, Data Prep Kit, and BeeAI into the LF AI & Data family,” said Todd Moore, SVP, Community Operations at the Linux Foundation and interim Executive Director, LF AI & Data. “These contributions from IBM reflect a strong commitment to open collaboration and responsible AI. I love BeeAI’s commitment to both JavaScript and Python for aggregated learning.”

The three projects add new capabilities to distinct layers of the AI stack. IBM’s BeeAI is an open-source platform designed to make AI agents work together more efficiently. According to IBM, it allows developers to build, discover, run, and connect agents, making multi-agent workflows smoother. Additionally, by supporting both Python and JavaScript, BeeAI offers broad compatibility across different environments. 

Powered by the open Agent Communication Protocol (ACP), the BeeAI project is ideally suited for multi-agent AI workflows, such as handling complex customer queries or analyzing large datasets. 

The Docling project functions as a document intelligence platform built for the extraction, labeling, and generation of structured content from unstructured documents. Designed as an open-source ecosystem of Python tools, Docling helps developers automate document conversion and manipulation while ensuring structured data remains accurate. 

With over 27,000 stars on GitHub, Docling has gained traction as a go-to toolkit for AI-powered document understanding and extraction. The typical use cases for Docling include document classification and tagging, automated data extraction, and relationship mapping.

As the name suggests, the Data Prep Kit helps clean, transform, and trace unstructured data for LLMs. It works with both batch and streaming data, making it flexible for different workflows. The goal of the project is to improve data quality and transparency while ensuring AI systems can scale effectively. 

Data Prep Kit can be used for LLM training purposes, enrichment of streaming data environments, standardizing data formats and sources, and other data refinement tasks essential for AI model optimization. 

“Docling, Data Prep Kit, and BeeAI were born from a need to fill critical gaps in AI development tooling and accelerate innovation in the Generative AI space. We’re proud to see them as a catalyst enabling the broader open-source community to build AI applications and agentic workflows,” said Brad Topol, Distinguished Engineer and Director of Open Source at IBM. “We’re excited to collaborate with the open-source community to evolve these technologies and solve real-world challenges together.”

The three projects are now publicly available for exploration and contribution. A key reason IBM turned the projects over to the Linux Foundation is to benefit from the improvements that outside developers bring to the projects. However, the success of these projects will depend on sustained community adoption and ecosystem integration. 

The press release did not specify what open-source licenses these projects use. With recent disputes over AI licensing, like Meta’s legal trouble around its LLaMA models, it is more important than ever to have clear guidelines on how these tools can be used and shared.

The Linux Foundation’s push for open AI isn’t just about competition, it is about making AI more accessible. The company claims that by supporting tools like Docling, BeeAI, and Data Prep Kit, it ensures that developers and businesses aren’t dependent on closed, corporate-controlled models. This approach gives innovators the flexibility to shape AI in ways that best suit their needs, rather than being locked into proprietary systems.

Related Items

Future of AI and Open Source in Data Science: Insights from Anaconda’s Latest Report

IBM Unveils New Open-Source Granite Models to Enhance AI Capabilities

Vital Lessons Burgeoning Technologies Can Learn From the Open Source Movement

BigDATAwire