

In the COVID era, computational biology is having a heyday – and machine learning is playing a massive role. With billions upon billions of compounds to search through for any given therapeutic application, strictly brute-force simulations are wildly unfeasible, necessitating more artificially intelligent methods of whittling down the options. Now, researchers from IRB Barcelona’s Structural Bioinformatics and Network Biology lab have developed a deep learning method that predicts the biological activity of any given molecule – even in the absence of experimental data.
The researchers, led by Patrick Aloy, are applying deep machine learning to a massive dataset: the Chemical Checker, which provides processed, harmonized, and integrated bioactivity data on 800,000 small molecules and is also produced by the Structural Bioinformatics and Network Biology lab. In total, any given molecule has 25 bioactivity “spaces,” but for most molecules, data on only a few are known – if that.
Using the new deep learning tool, that’s changing. The Chemical Checker database contains data on all 25 bioactivity spaces from each of those 800,000 molecules, and the tool, having been trained on that data, can predict all the bioactivity spaces of any molecules with incomplete bioactivity data. “The new tool … allows us to forecast the bioactivity spaces of new molecules, and this is crucial in the drug discovery process as we can select the most suitable candidates and discard those that, for one reason or another, would not work,” explained Aloy.
Of course, the prediction isn’t perfect, and assessing molecules with more available data will allow the tool to produce higher-confidence predictions. Some molecules, as well, prove simply more or less difficult for the tool to assess. “All models are wrong, but some are useful,” said Martino Bertoni, first author on the paper describing the research. “A measure of confidence allows us to better interpret the results and highlight which spaces of bioactivity of a molecule are accurate and in which ones an error rate can be contemplated.”
The researchers chose a challenging case for validation: a cancer-related transcription factor that was broadly considered an “undruggable” target. The tool identified 131 compounds that fit the target by predicting their bioactivity spaces, and their ability to degrade the target was experimentally confirmed.
The research described in this article was published as “Bioactivity descriptors for uncharacterized chemical compounds” in the June 2021 issue of Nature Communications. The article was written by Martino Bertoni, Miquel Duran-Frigola, Pau Badia-i-Mompel, Eduardo Pauls, Modesto Orozco-Ruiz, Oriol Guitart-Pla, Víctor Alcalde, Víctor M. Diaz, Antoni Berenguer-Llergo, Isabelle Brun-Heath, Núria Villegas, Antonio García de Herreros and Patrick Aloy. To read it, click here.
April 25, 2025
- Denodo Supports Real-Time Data Integration for Hospital Sant Joan de Déu Barcelona
- Redwood Expands Automation Platform with Introduction of Redwood Insights
- Datatonic Announces Acquisition of Syntio to Expand Global Services and Delivery Capabilities
April 24, 2025
- Dataiku Expands Platform with Tools to Build, Govern, and Monitor AI Agents at Scale
- Indicium Launches IndiMesh to Streamline Enterprise AI and Data Systems
- StorONE and Phison Unveil Storage Platform Designed for LLM Training and AI Workflows
- Dataminr Raises $100M to Accelerate Global Push for Real-Time AI Intelligence
- Elastic Announces General Availability of Elastic Cloud Serverless on Google Cloud Marketplace
- CNCF Announces Schedule for OpenTelemetry Community Day
- Thoughtworks Signs Global Strategic Collaboration Agreement with AWS
April 23, 2025
- Metomic Introduces AI Data Protection Solution Amid Rising Concerns Over Sensitive Data Exposure in AI Tools
- Astronomer Unveils Apache Airflow 3 to Power AI and Real-Time Data Workflows
- CNCF Announces OpenObservabilityCon North America
- Domino Wins $16.5M DOD Award to Power Navy AI Infrastructure for Mine Detection
- Endor Labs Raises $93M to Expand AI-Powered AppSec Platform
- Ocient Announces Close of Series B Extension Financing to Accelerate Solutions for Complex Data and AI Workloads
April 22, 2025
- O’Reilly Launches AI Codecon, New Virtual Conference Series on the Future of AI-Enabled Development
- Qlik Powers Alpha Auto Group’s Global Growth with Automotive-Focused Analytics
- Docker Extends AI Momentum with MCP Tools Built for Developers
- John Snow Labs Unveils End-to-End HCC Coding Solution at Healthcare NLP Summit
- PayPal Feeds the DL Beast with Huge Vault of Fraud Data
- OpenTelemetry Is Too Complicated, VictoriaMetrics Says
- Will Model Context Protocol (MCP) Become the Standard for Agentic AI?
- Thriving in the Second Wave of Big Data Modernization
- What Benchmarks Say About Agentic AI’s Coding Potential
- Google Cloud Preps for Agentic AI Era with ‘Ironwood’ TPU, New Models and Software
- Google Cloud Fleshes Out its Databases at Next 2025, with an Eye to AI
- Can We Learn to Live with AI Hallucinations?
- Monte Carlo Brings AI Agents Into the Data Observability Fold
- AI Today and Tomorrow Series #3: HPC and AI—When Worlds Converge/Collide
- More Features…
- Grafana’s Annual Report Uncovers Key Insights into the Future of Observability
- Google Cloud Cranks Up the Analytics at Next 2025
- New Intel CEO Lip-Bu Tan Promises Return to Engineering Innovation in Major Address
- AI One Emerges from Stealth to “End the Data Lake Era”
- SnapLogic Connects the Dots Between Agents, APIs, and Work AI
- Snowflake Bolsters Support for Apache Iceberg Tables
- GigaOM Report Highlights Top Performers in Unstructured Data Management for 2025
- Supabase’s $200M Raise Signals Big Ambitions
- New Benchmark for Real-Time Analytics Released by Timescale
- Big Data Career Notes for March 2025
- More News In Brief…
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- MinIO: Introducing Model Context Protocol Server for MinIO AIStor
- Dataiku Achieves AWS Generative AI Competency
- AMD Powers New Google Cloud C4D and H4D VMs with 5th Gen EPYC CPUs
- Prophecy Introduces Fully Governed Self-Service Data Preparation for Databricks SQL
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- CData Launches Microsoft Fabric Integration Accelerator
- MLCommons Releases New MLPerf Inference v5.0 Benchmark Results
- Opsera Raises $20M to Expand AI-Driven DevOps Platform
- GitLab Announces the General Availability of GitLab Duo with Amazon Q
- More This Just In…