New community-led data resource for protein structure and function launches
EMBL’s European Bioinformatics Institute (EMBL-EBI) and collaborators have launched a new data resource, called Protein Data Bank in Europe Knowledge Base (PDBe-KB), which gives researchers a more comprehensive view of publicly available protein structure data. The PDB archive recently passed 150,000 structures and this resource opens up this growing body of knowledge to the wider life sciences community, including drug discovery researchers.
PDBe-KB is a ‘sister resource’ of EMBL-EBI’s PDBe, which gives access to the PDB archive, one of the world’s largest open archives for findable, accessible, interoperable and reusable (FAIR) protein structure data.
“Thousands of labs around the world produce data and annotations for protein structures,” says Sameer Velankar, Team Leader of the Protein Data Bank in Europe. “Today’s challenge is making all the data available in one place and presenting them in an accessible way.
“Right now, there are many databases that contain protein structure data, but they lack the exposure and accessibility to enable them to be used more widely. PDBe-KB aims to integrate protein data and annotations from existing resources and present everything in a useful way.”
One of the innovative elements of PDBe-KB is the protein-specific aggregated views. These pages highlight the available information related to structures for specific proteins, including structural and functional annotations, domains, ligand-binding sites and more. The protein pages can save researchers a lot of time by essentially answering the question ‘what do we currently know about this protein?’.
PDBe-KB also enables researchers to see what molecules bind to their protein of interest, highlighting sites of protein-protein interactions and small molecule binding. This information is particularly useful for drug development and can help researchers prioritise which drugs could interact in a desired way with a specific protein.
PDBe-KB could prove useful for researchers studying small molecules, pathways or specific diseases and conditions. It aims to open up protein structure data to researchers who don’t necessarily have structural biology training.
The bigger picture
“The difference between PDBe and PDBe-KB is a bit like the difference between satellite images and Google Maps,” says Mihaly Varadi, Scientific Programmer at EMBL-EBI. “Satellite images give us a good idea of what is there, but you need a lot of background knowledge to understand what you’re looking at. By contrast, Google Maps is easy to use and offers additional information from other sources. Just like Google Maps can offer a comprehensive view of a street, PDBe-KB enables scientists to visualise all the available data on a specific protein in an intuitive way.”
The difference between PDBe and PDBe-KB is a bit like the difference between satellite images and Google Maps
PDBe-KB aggregated views are currently based on UniProt accession numbers, which are widely used in the life sciences community. In future, the infrastructure developed by the PDBe-KB resource could enable aggregated views for other entities within PDB data, broadening this beyond individual Uniprot accession and, for instance, focusing on small molecules or complexes.
PDBe-KB currently collaborates with 24 resources from seven countries that provide predicted and evidence-based annotations of structural data derived from PDB structures. As the coordinator of this new data resource, EMBL-EBI leads on creating the data infrastructure and access mechanisms, and maintaining community agreed standards for the new PDBe-KB resource.