Bioinformatics services: Winter 2015

Half a million molecular interactions | ChEMBL adopts HELM standard | Protein data to knowledge: PRIDE | Reactome turns 50 | ... and more EMBL-EBI bioinformatics service updates

500,000 binary interactions… and growing

More than half a million experimentally determined protein interactions are freely available in EMBL-EBI’s IntAct database, providing a means to build and visualise the network of protein interactions at play in living things. This public data resource continues to grow thanks to experimental data submitted directly by researchers, and is bolstered by data captured from the scientific literature.

Read more about IntAct > and the recently published dataset, ‘A proteome-scale map of the human interactome network’ >

ChEMBL takes the HELM

ChEMBL, the database of compound bioactivity data and drug targets, now incorporates the Hierarchical Editing Language for Macromolecules (HELM), a standard recently released by the Pistoia Alliance. HELM can be used to represent simple macromolecules such as antibodies, complex entities or conjugated species such as antibody-drug conjugates. Including the HELM notation for ChEMBL’s peptide-derived drugs and compounds will, in future, enable researchers to query that content in new ways, for example in sequence- and chemistry-based searches.

Big (protein) data to knowledge

An NIH-funded Big Data to Knowledge (BD2K) centre of excellence by UCLA, Scripps and EMBL-EBI draws on crowdsourcing, cloud technologies and clinical cohorts to transform protein data to knowledge. Using cardiovascular data from two major cohorts, the partners are integrating proteomics, metabolomics, variation and molecular pathway data as part of an efficient, global digital ecosystem for biomedical research.

Reactome at 50…

Reactome, the database of molecular pathways, issued its 50th release this winter – a major milestone for one of the largest open-source pathway resources. With thousands of new additions, Reactome has become one of the world’s largest freely accessible, open-source pathway resources. Since Reactome scientists started curating and exporting pathway and reaction data 10 years ago, the resource has grown to include annotations for over a third of the protein-coding genes in the current human genome assembly in Ensembl.

Read about the latest release of Reacome >

… and InterPro at 50

InterPro, the resource for protein sequence analysis and classification, has upgraded one of its member databases (PIRSF) to the latest version of the HMMER search algorithm, making it faster and more sensitive. Together with on-going biocuration, this helps continue the flow of important annotations into UniProt, and provides researchers with the most up-to-date functional information about protein families and motifs. InterPro version 50 covers over 80% of the latest release of the UniProt Knowledgebase and predicts gene ontology (GO) terms, which indicate biological processes and function, for tens of millions of UniProt proteins.

UniProt: a unique Knowledge-base

UniProt, the Universal Protein Resource, features new and modified disease entries, improved searching and many other updates. UniProt’s latest headline article focuses on the work of McBride and colleagues, who studied Odorant receptor 4 (Or4) in several subspecies of the mosquito Aedes aegypti to understand the genetic basis underlying the mosquito’s preference for humans.

Looking for genomes?

If you are looking for whole genomes, protein sequences, alignments or other genome-wide data, have a look at the Ensembl FTP site. Here, you can download data from the current and previous releases of Ensembl in bulk and for free. Updates to Ensembl this spring will include an updated version of the GENCODE gene set, patches for the latest human genome assembly and new Global Alliance standard REST endpoints for sets of variation data.

ChEBI: over 43 000 chemical entities

With the 125^th release of ChEBI, this dictionary of ‘small’ chemical compounds now offers data on over 43 000 fully annotated entities. This release is accompanied by a feature on ellagic acid, a polyphenol antioxidant found in many fruits, nuts, species of oak and the Japanese medicinal mushroom Phellinus linteus and an entity of interest in the study of obesity.

ENA simplifies data releases

The European Nucleotide Archive (ENA) has begun coupling the public release of sequence records to the release of study records. Under the new system, all raw read data and assembled/annotated sequence records associated with studies are released into the public domain as soon as the study’s release date has been reached and the study made public.

Quite interesting PDB structures

Have you come across structures in paper figures or journal covers that you’d like to know more about? EMBL-EBI’s Protein Data Bank in Europe team has a dynamic blog that helps you start exploring structures by featuring quite interesting protein structures (“Quips”). One recent instalment looks at acetylcholinesterase, the first neurotransmitter to be identified, earning Dale and Loewi the 1936 Nobel Prize in Physiology or Medicine. AChE is a target for chemical weapons, pest-control agents, drugs, and even snake venoms.

Bioinformatics services: Winter 2015

500,000 binary interactions… and growing

ChEMBL takes the HELM

Big (protein) data to knowledge

Reactome at 50…

… and InterPro at 50

UniProt: a unique Knowledge-base

Looking for genomes?

ChEBI: over 43 000 chemical entities

ENA simplifies data releases

Quite interesting PDB structures

Related links

More from this category

Science City Day at EMBL Hamburg

Run, Scientist, Run: engaging with science through play

Celebrating 50 years of infinite curiosity

Looking back: EMBL in the 1970s and 1980s

EMBLetc.

Issue 101 Winter 2023

Advocating for a generalist approach to science and life

Why time is of the essence in development

Deciphering the data deluge: how large language models are transforming scientific data curation

Bioinformatics services: Winter 2015

500,000 binary interactions… and growing

ChEMBL takes the HELM

Big (protein) data to knowledge

Reactome at 50…

… and InterPro at 50

UniProt: a unique Knowledge-base

Looking for genomes?

ChEBI: over 43 000 chemical entities

ENA simplifies data releases

Quite interesting PDB structures

Related links

Share this

More from this category

EMBLetc.

Issue 101 Winter 2023

Subscribe to our e-newsletter

Newsletter archive

For press

Follow us