Edit

Powering drug discovery through AI-ready protein–ligand data

LIGAND-AI set to generate open, standardised, high-quality datasets of protein–ligand interactions at an unprecedented scale

A new multi-sector public-private partnership, LIGAND-AI, will generate large, open, high-quality datasets of protein–ligand interactions. It will then use these data to train artificial intelligence (AI) models capable of predicting drug-like molecules as binders for thousands of human proteins. The project is funded by the Innovative Health Initiative (IHI) and brings together 18 partners, including EMBL’s European Bioinformatics Institute (EMBL-EBI).

Led by Pfizer and the Structural Genomics Consortium (SGC), LIGAND-AI will study thousands of proteins relevant to existing and unmet disease areas, including rare, neurological, and oncological conditions.

Early drug discovery is a long, expensive, and uncertain process. Scientists spend years testing thousands of molecules to find just one that binds to a disease-related protein. LIGAND-AI aims to change this by combining advanced laboratory technologies with computational methods to create a pipeline from experiment to prediction. 

The consortium will generate billions of data records using complementary screening technologies. This will enable researchers worldwide to develop, train, and benchmark AI models that predict molecular interactions.

“This project brings together scientists and companies from across disciplines within an open science ecosystem. It is heartening to see these diverse scientific communities work around a common vision to generate and share valuable chemical data openly with the world,” said Aled Edwards, CEO of the Structural Genomics Consortium and project coordinator.

Target 2035

LIGAND-AI is part of the Target 2035 project, a global open-science initiative where researchers across sectors cooperate to advance molecular tools for studying human proteins. By driving collaboration between public and private partners, Target 2035 aims to develop a pharmacological modulator for every protein in the human proteome by 2035.

New drug targets for rare diseases 

A large proportion of human proteins remain unexplored because scientists lack the chemical tools needed to investigate their biology. This gap limits research into many areas of human health, including rare diseases. The LIGAND-AI project aims to focus on this challenge, generating large-scale, openly available datasets of potential chemical probes for some of these overlooked proteins.

EMBL-EBI will play a central role in ensuring open access to the data generated by LIGAND-AI, by archiving these large-scale datasets and integrating validated results into the ChEMBL database.

“Raw data collected from the project will be added to BioStudies, and all validated hits will be deposited in ChEMBL and made openly available to the community. For these understudied targets, these kinds of data just don’t exist at the moment,” said Noel O’Boyle, Chemical Biology Resources Team Leader at EMBL-EBI. 

The full press release was originally published on the SGC website


Tags: AI, bioinformatics, biostudy, chembl, chemical biology, drug discovery, embl-ebi

News archive

E-newsletter archive

EMBLetc archive

News archive

For press

Contact the Press Office
Edit