An open source database of protein identifications
The European Bioinformatics Institute and Flanders Interuniversity Institute for Biotechnology (VIB) – Ghent University have launched the PRoteomics IDEntifications database (PRIDE). PRIDE allows researchers who work in the field of proteomics – the large-scale study of proteins – to share information much more readily than was previously possible. This will allow them to exploit the growing mass of information on how the body’s complement of proteins is altered in many disease states, paving the way towards new predictive and diagnostic methods in medicine.
Proteomics is the identification and characterization of all the proteins produced by a particular type of cell, tissue or organism under certain conditions. While an individual’s genome remains the same from one moment to the next, proteomes are extremely dynamic. For example, the set of proteins produced by your liver will change in response to eating a meal, and a healthy liver produces a different set of proteins than a diseased liver. Proteomics therefore has great potential, not only for helping us to understand how our environment affects the healthy body, but also for understanding disease mechanisms and developing new ways of diagnosing disease. Large international efforts to document all the proteins produced by several tissues, including liver, brain and blood plasma, are now underway. But although the high-throughput identification of proteins is gathering momentum, until recently there was no straightforward means of sharing or comparing the results.
“Proteomics labs were publishing their protein identifications,” explains Henning Hermjakob, leader of the EBI’s Proteomics Services Team, “but they had no guidelines as what information should be captured or how the information should be formatted. The proteomics community rapidly realized that researchers would only be able to exploit the results of their endeavours if they had a central repository that would allow them to make their results publicly available using agreed data standards.”
“Once everyone makes their data available in the same format, it becomes possible to use powerful computational techniques to analyse the data” continues Joël Vanderkerckhove from VIBÐGhent University. “It then becomes trivial to analyse protein identifications from many different sources, or compare the proteins produced by a particular tissue under different conditions.” PRIDE is closely linked to the Human Proteomics Organization’s Proteomics Standards Initiative (HUPO-PSI) and will allow users to transfer data using the standards that are currently being developed as part of PSI.
Large sets of data already available in PRIDE include the results of the Human Proteome Organization’s Plasma Proteome Project, and a human platelet proteome set published by Ghent University. The results of other international collaborations, such as the Human Proteome Organization’s Liver Proteome Project, will follow as they are published. PRIDE is completely open source: the PRIDE database, source code, data, and support tools are freely available for web access or download and local installation.
“We hope that proteomics researchers and publishing companies will adopt PRIDE as the method of choice for making proteomics data freely available to and exploitable by the proteomics community. We also hope to collaborate with other providers of protein identification data to maximize the availability of comprehensive and up-to-date protein identification information” concludes Henning Hermjakob.
It’s almost a year since the coronavirus outbreak was declared a pandemic, affecting all our lives. While the virus continues its grip on the world, scientists are understanding it better and better, increasing our knowledge about it and opening up new ways to fight it.