Machine learning finds critical phosphosites

A new resource for identifying functional human phosphorites relevant for diverse biological processes and disease

Artist's impression of phosphosite and machine learning
Artist's impression of phosphosite and machine learning. Credit: Spencer Phillips

Researchers at the EMBL’s European Bioinformatics Institute (EMBL-EBI) have created the largest reference phosphoproteome to date of almost 120 000 human phosphosites. To identify those most likely to be critical, they used a machine learning approach capable of ranking them according to functional importance.

Proteins are the core molecular machines of the cell that can be regulated by protein modifications, akin to molecular switches. Protein phosphorylation is one such molecular switch that can cause a protein to become activated, deactivated or modify its function. Despite decades of work number of these modifications critical for life remains a mystery.

This research, published in Nature Biotechnology, creates a freely-accessible resource that can be used by researchers to better understand which proteins are phosphorylated and which phosphosites have functional relevance. Access to this data has significant implications to accelerate the progression of research into many different biological processes and diseases.

Machine learning and data sharing

“This new resource would not have been possible if scientists around the world didn’t share their research data and results,” says Pedro Beltrao, Group Leader at the EMBL-EBI. “It would take a single machine over 500 days to run the mass spectrometry experiments used to create this database. By applying machine learning to this huge dataset, we created a scoring system to help researchers determine which lesser-known phosphosites to explore next.”

The researchers at EMBL-EBI curated over 100 publicly available phospho-enriched human datasets containing over 6000 mass-spectrometry experiments from EMBL-EBI’s PRoteomics IDEntifications (PRIDE) database. This large-scale project has generated the biggest open access reference phosphoproteome database to date.

Functional human phosphosites

To identify the most critical phosphosites, researchers used machine learning to integrate diverse annotations for each site such as the degree of conservation. The phosphosite functional score this creates has enormous potential to help other scientists uncover more about their proteins of interest. It can be used to rank known phosphosites to distinguish those which are functionally relevant for molecular processes and disease.

The researchers were able to demonstrate the practicality of their functional score model by identifying two high-scoring phosphosites which play a role in regulating neuronal differentiation.

“The functional score model created from this study can be used to uncover an abundance of new, functional phosphosites that may play crucial roles in disease,” says David Ochoa, Project Coordinator at Open Targets. “We already know of several groups using the scoring model. We would like to encourage researchers everywhere to explore the resource and make use of it.”

Tags: bioinformatics, cell biology, database, embl-ebi, press release

EMBL-EBI Press Office

Wellcome Genome Campus
Hinxton, Cambridgeshire
CB10 1SD, UK

+44 1223 494369


Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.

EMBLetc. archive

Newsletter archive

Read past editions of our e-newsletter

For press

Contact the Press Office