The MANE collaboration: working together to support genome science

Researchers from the MANE collaboration bring you the most comprehensive human genome annotation dataset to date

A DNA strand, yellow at the left end blue at the right end and green in the middle where they both meet. In the background are three human silhouettes and DNA sequence information.
Combining two sources of the human genome annotation to create one unified resource. Credit: Karen Arnott/EMBL

Researchers working within the Ensembl team at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the RefSeq team at the National Center for Biotechnology Information (NCBI) have reached a milestone in their Matched Annotation collaboration.

Ensembl and RefSeq have provided the most frequently used sources of the human genome annotation, by making databases of human reference gene and transcript sets freely available to researchers everywhere. However, these two datasets are not identical across every gene.

One solution to this challenge is the MANE collaboration. Research stemming from this collaboration published in the journal Nature, combines work from both EMBL-EBI and NCBI researchers to include in their public annotations one identical transcript for every protein-coding gene in the human reference genome, and flag this for use as a universal standard. Having access to this standard annotation is of vital importance in a clinical setting where precise reporting of small genomic variations will help to streamline medical interpretation of genomic data.

Working together   

“We know that differences between the two human annotation datasets from EMBL-EBI and NCBI can cause problems to researchers so we have been working together to create greater convergence,” said Adam Frankish, Manual Genome Annotation Coordinator at EMBL-EBI. “Creating this combined resource took a great deal of time and collaborative effort. Agreeing on some complex regions in the genome was tricky but we’re now at a stage where we are in agreement on almost all the protein coding genes and aim to get to 100%.”

The only way transcripts made it into the MANE dataset is if both groups agreed on it. Computational pipelines help to speed things up but a lot of the annotation work comes down to researchers looking at the alignments manually to check that they’re correctly interpreted. Because the human genome annotation is so incredibly important to many researchers and clinicians all over the world, the two institutes felt it was important to work together on a definitive resource.

“The MANE Select transcript project is a critical resource for both the clinical and research communities, increasing the consistency of variant nomenclature which makes it easier to share variation and build evidence to assess pathogenicity,” said Heidi Rehm Professor of Pathology at Massachusetts General Hospital and the Broad Institute of MIT and Harvard.

Accessing the MANE data

The MANE annotation transcripts are now the default transcript within the Ensembl genome browser. The annotation is also freely available through RefSeq and the Ensembl Transcript Archive. Many large scale projects including DECIPHER, ClinVar and gnomAD are already adopting the MANE annotation set within their workflows. Also, to make things even simpler, you can search the MANE transcript data using both Ensembl or RefSeq IDs if these are what you are familiar with.

Improved clinical reporting

“One of the main uses of the human genome annotation is for clinical reporting so it’s important that there is a single standard annotation that medical professionals can reliably use quickly and easily,” said Jane Loveland, Annotation Project Leader at EMBL-EBI. “Before MANE we would suggest that researchers look at everything available to be sure of their analysis but that’s not very helpful in a clinical setting when medical professionals need answers rapidly. Having a standard annotation resource for the human genome will make things much simpler.”

Creating MANE required a lot of discussions and careful checks of every feature in the genome. Some of the more complicated examples arise when there is a lack of experimental evidence. This can happen if a specific gene is expressed in a tissue in the human body that is difficult to access. For example, genes expressed in a very small sub-region of the brain or a narrow window during embryonic development. It is examples like these that require extra work plus the knowledge and experience of the annotation teams to interpret what data is available to figure out exactly what’s going on in the genome. 

“It’s also really important for us to make clear that just because we’ve annotated the best representative transcript at a gene based on current data that doesn’t mean that the other transcripts documented at specific loci should be discounted,” added Loveland. “We would still encourage people to look at the other transcripts if they can but we hope that this high-value dataset will promote consistency of reporting and drive improvements in human health and diagnostics.”

Source article(s)

Tags: bioinformatics, database, embl-ebi, ensembl, genomics, human genetics, open access, open data, open science


Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.

EMBLetc. archive

Newsletter archive

Read past editions of our e-newsletter

For press

Contact the Press Office