Half a million molecular interactions | ChEMBL adopts HELM standard | Protein data to knowledge: PRIDE | Reactome turns 50 | ... and more EMBL-EBI bioinformatics service updates
500,000 binary interactions… and growing
More than half a million experimentally determined protein interactions are freely available in EMBL-EBI’s IntAct database, providing a means to build and visualise the network of protein interactions at play in living things. This public data resource continues to grow thanks to experimental data submitted directly by researchers, and is bolstered by data captured from the scientific literature.
ChEMBL, the database of compound bioactivity data and drug targets, now incorporates the Hierarchical Editing Language for Macromolecules (HELM), a standard recently released by the Pistoia Alliance. HELM can be used to represent simple macromolecules such as antibodies, complex entities or conjugated species such as antibody-drug conjugates. Including the HELM notation for ChEMBL’s peptide-derived drugs and compounds will, in future, enable researchers to query that content in new ways, for example in sequence- and chemistry-based searches.
An NIH-funded Big Data to Knowledge (BD2K) centre of excellence by UCLA, Scripps and EMBL-EBI draws on crowdsourcing, cloud technologies and clinical cohorts to transform protein data to knowledge. Using cardiovascular data from two major cohorts, the partners are integrating proteomics, metabolomics, variation and molecular pathway data as part of an efficient, global digital ecosystem for biomedical research.
Reactome, the database of molecular pathways, issued its 50th release this winter – a major milestone for one of the largest open-source pathway resources. With thousands of new additions, Reactome has become one of the world’s largest freely accessible, open-source pathway resources. Since Reactome scientists started curating and exporting pathway and reaction data 10 years ago, the resource has grown to include annotations for over a third of the protein-coding genes in the current human genome assembly in Ensembl.
InterPro, the resource for protein sequence analysis and classification, has upgraded one of its member databases (PIRSF) to the latest version of the HMMER search algorithm, making it faster and more sensitive. Together with on-going biocuration, this helps continue the flow of important annotations into UniProt, and provides researchers with the most up-to-date functional information about protein families and motifs. InterPro version 50 covers over 80% of the latest release of the UniProt Knowledgebase and predicts gene ontology (GO) terms, which indicate biological processes and function, for tens of millions of UniProt proteins.
UniProt, the Universal Protein Resource, features new and modified disease entries, improved searching and many other updates. UniProt’s latest headline article focuses on the work of McBride and colleagues, who studied Odorant receptor 4 (Or4) in several subspecies of the mosquito Aedes aegypti to understand the genetic basis underlying the mosquito’s preference for humans.
If you are looking for whole genomes, protein sequences, alignments or other genome-wide data, have a look at the Ensembl FTP site. Here, you can download data from the current and previous releases of Ensembl in bulk and for free. Updates to Ensembl this spring will include an updated version of the GENCODE gene set, patches for the latest human genome assembly and new Global Alliance standard REST endpoints for sets of variation data.
With the 125th release of ChEBI, this dictionary of ‘small’ chemical compounds now offers data on over 43 000 fully annotated entities. This release is accompanied by a feature on ellagic acid, a polyphenol antioxidant found in many fruits, nuts, species of oak and the Japanese medicinal mushroom Phellinus linteus and an entity of interest in the study of obesity.
The European Nucleotide Archive (ENA) has begun coupling the public release of sequence records to the release of study records. Under the new system, all raw read data and assembled/annotated sequence records associated with studies are released into the public domain as soon as the study’s release date has been reached and the study made public.
Have you come across structures in paper figures or journal covers that you’d like to know more about? EMBL-EBI’s Protein Data Bank in Europe team has a dynamic blog that helps you start exploring structures by featuring quite interesting protein structures (“Quips”). One recent instalment looks at acetylcholinesterase, the first neurotransmitter to be identified, earning Dale and Loewi the 1936 Nobel Prize in Physiology or Medicine. AChE is a target for chemical weapons, pest-control agents, drugs, and even snake venoms.