

In the COVID era, computational biology is having a heyday – and machine learning is playing a massive role. With billions upon billions of compounds to search through for any given therapeutic application, strictly brute-force simulations are wildly unfeasible, necessitating more artificially intelligent methods of whittling down the options. Now, researchers from IRB Barcelona’s Structural Bioinformatics and Network Biology lab have developed a deep learning method that predicts the biological activity of any given molecule – even in the absence of experimental data.
The researchers, led by Patrick Aloy, are applying deep machine learning to a massive dataset: the Chemical Checker, which provides processed, harmonized, and integrated bioactivity data on 800,000 small molecules and is also produced by the Structural Bioinformatics and Network Biology lab. In total, any given molecule has 25 bioactivity “spaces,” but for most molecules, data on only a few are known – if that.
Using the new deep learning tool, that’s changing. The Chemical Checker database contains data on all 25 bioactivity spaces from each of those 800,000 molecules, and the tool, having been trained on that data, can predict all the bioactivity spaces of any molecules with incomplete bioactivity data. “The new tool … allows us to forecast the bioactivity spaces of new molecules, and this is crucial in the drug discovery process as we can select the most suitable candidates and discard those that, for one reason or another, would not work,” explained Aloy.
Of course, the prediction isn’t perfect, and assessing molecules with more available data will allow the tool to produce higher-confidence predictions. Some molecules, as well, prove simply more or less difficult for the tool to assess. “All models are wrong, but some are useful,” said Martino Bertoni, first author on the paper describing the research. “A measure of confidence allows us to better interpret the results and highlight which spaces of bioactivity of a molecule are accurate and in which ones an error rate can be contemplated.”
The researchers chose a challenging case for validation: a cancer-related transcription factor that was broadly considered an “undruggable” target. The tool identified 131 compounds that fit the target by predicting their bioactivity spaces, and their ability to degrade the target was experimentally confirmed.
The research described in this article was published as “Bioactivity descriptors for uncharacterized chemical compounds” in the June 2021 issue of Nature Communications. The article was written by Martino Bertoni, Miquel Duran-Frigola, Pau Badia-i-Mompel, Eduardo Pauls, Modesto Orozco-Ruiz, Oriol Guitart-Pla, Víctor Alcalde, Víctor M. Diaz, Antoni Berenguer-Llergo, Isabelle Brun-Heath, Núria Villegas, Antonio García de Herreros and Patrick Aloy. To read it, click here.
September 26, 2025
- Salesforce Launches MuleSoft Agent Fabric for AI Agent Orchestration and Governance
- EDB Contributes Major Enhancements to PostgreSQL 18 for AI Workloads
- Pure Storage’s Enterprise Data Cloud Unifies Data to Give Businesses Greater Control of their AI Initiatives
September 25, 2025
- CData Launches Connect AI to Transform How AI Accesses Business Data in Real-Time
- Haveli Investments Completes Acquisition of Couchbase
- Denodo Research Shows Enterprises Risk Millions by Using Lakehouses Alone
- Databricks and OpenAI Partner to Bring Frontier Models to 20,000+ Enterprises
- Dresner Advisory Services Announces 2025 Industry Excellence Awards
- Ardent AI Raises $2.15M to Build the First AI Data Engineer
- Cloudera Integrates with Dell ObjectScale for Unified Data Storage and AI Compute
- Precisely Expands Access to AI-Powered Data Quality with Natural Language Interfaces in Data Integrity Suite
- Dremio Announces Book on Apache Polaris Released by O’Reilly Media
September 24, 2025
- Testkube Raises $8M Series A to Help Software Quality Keep Pace With AI Development Velocity
- Distyl AI Gains $175M Funding to Expand AI-Native Operating Model Across Industries
- Obot AI Secures $35M Seed Round to Drive Adoption of Model Context Protocol
- Gurobi Releases 2025 State of Mathematical Optimization Report
- StorMagic and SNUC Partner to Deliver Integrated HCI Appliances Purpose-Built for the Edge
September 23, 2025
- Inside Sibyl, Google’s Massively Parallel Machine Learning Platform
- Rethinking Risk: The Role of Selective Retrieval in Data Lake Strategies
- What Are Reasoning Models and Why You Should Care
- In Order to Scale AI with Confidence, Enterprise CTOs Must Unlock the Value of Unstructured Data
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- Meet Krishna Subramanian, a 2025 BigDATAwire Person to Watch
- Beyond Words: Battle for Semantic Layer Supremacy Heats Up
- The AI Beatings Will Continue Until Data Improves
- How to Make Data Work for What’s Next
- What Is MosaicML, and Why Is Databricks Buying It For $1.3B?
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- AI Hype Cycle: Gartner Charts the Rise of Agents, ModelOps, Synthetic Data, and AI Engineering
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- MIT Report Flags 95% GenAI Failure Rate, But Critics Say It Oversimplifies
- The Top Five Data Labeling Firms According to Everest Group
- Promethium Wants to Make Self Service Data Work at AI Scale
- AI Agents Debut Atop Gartner 2025 Hype Cycle for Emerging Tech
- Career Notes for August 2025
- Sphinx Emerges with Copilot for Data Science
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- More News In Brief…
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- DataSnap Expands with AI-Enabled Embedded Analytics to Accelerate Growth for Modern Businesses
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- Qlik Announces Canada Cloud Region to Empower Data Sovereignty and AI Innovation
- EY Announces Alliance with Boomi to Offer Broad Integrated Solutions and AI-Powered Transformation
- Snowflake, Salesforce, dbt Labs, and More Launch Open Semantic Interchange Initiative to Standardize Data Semantics
- Acceldata Announces General Availability of Agentic Data Management
- NVIDIA AI Foundry Builds Custom Llama 3.1 Generative AI Models for the World’s Enterprises
- Databricks Surpasses $4B Revenue Run-Rate, Exceeding $1B AI Revenue Run-Rate
- Pecan AI Brings Explainable AI Forecasting Directly to Business Teams
- More This Just In…