Darwin Tree of Life: first 500 genome assemblies released as project creates a buzz

We look back through some of the 2022 highlights from the Darwin Tree of Life project

A group of people wearing Darwin Tree of Life t-shirts
The Darwin Tree of Life project in 2022. Photo credit: Luke Lythgoe, Wellcome Sanger Institute

As of December 2022, the Darwin Tree of Life (DToL) project – an ambitious project to sequence all species in the UK and Ireland – has released 500 reference-quality genome assemblies to public databases, ready to be used by researchers around the globe. Researchers at EMBL-EBI are supporting the DToL project by storing and annotating the genomes sequenced, and making these data openly available through Ensembl Rapid Release and the DToL Data Portal.

The successes of this last year have been down to the tireless DToL team – whether in the field or the lab, on computers or taking the DToL science on the road. Below is a selection of highlights chosen by some DToL partners.

Annotations, data portal features and geocaching

EMBL’s European Bioinformatics Institute (EMBL-EBI)

Three cards with pictures of different species
‘Into Nature’ super species cards. Credit: Briony Jackson, EMBL-EBI

In 2022, the EMBL-EBI DToL team reached the first 100 annotated genomes for new species. This number has steadily increased and these new genome annotations are openly available through the Ensembl DToL page and Ensembl Rapid Release.

The team also made some exciting updates to the DToL Data Portal – an open access platform managed by EMBL-EBI, which pulls together data from across the DToL project making it available all in one place. Users can track the sequencing progress of their species of interest. This feature was updated to include detailed status updates to monitor samples at each step of the process. They also added an interactive sampling map that allows users to identify where species samples have been collected.

On the public engagement side of things, the team launched a new multilingual activity to engage migrant communities with the science underpinning nature in local areas. This activity – called ‘Into Nature’ – combines geocaching with collectable cards that participants can track down to learn more about DToL species.

First 500 assemblies

The Wellcome Sanger Institute

Coloured blocks with icons of different species
The diversity of the first 500 Darwin Tree of Life genome assemblies released to public databases. Main colour blocks represent ‘kingdoms’ (animals, plants, fungi, protists); colour shades show different phyla; icons show the orders within each phylum. Credit: Wellcome Sanger Institute

For everyone at the Wellcome Sanger Institute’s Tree of Life Programme, 2022 will be looked back on as the year its genome production pipeline really powered into action. This initial stage of the DToL project has given proof of concept for this ambitious genomics venture. 

Could the team create a series of scientific processes that take an organism in the wild and transform it into a top-quality, chromosomal-level genome assembly representing its entire species? Could those processes be repeated again and again for thousands of species spanning every branch of the tree of life? Could a network of partner organisations focused on ecology, informatics, and analysis be brought together to achieve this goal?

The answer is a resounding yes. In the last 12 months, the team has more than doubled the number of DToL genome assemblies on public databases and tripled its catalogue of published Genome Notes.

The graphic above shows the diversity of the project’s first 500 assemblies across the tree of life. There has undoubtedly been a bias towards arthropods, for a number of reasons. They are easy to collect and sequence – for example, moths fly towards light, arthropod DNA is relatively easy to extract in the lab, and many have smaller genomes. But the team also wanted to showcase the project’s potential for aiding comparative genomics, with DToL scientists already publishing arthropod studies based on these genome assemblies.

There is also a significant breadth of diversity emerging. In 2022, the team published Genome Notes for the first fungi, cnidarians, tunicates, molluscs, and most recently, plants. The first protists are due to be published soon. Getting to grips with this dazzling array of organisms and their genomes is a key achievement in a very successful year.

The first plant genomes are published

Royal Botanical Garden Edinburgh (RBGE)

The European crab apple (Malus sylvestris) along the West Highland Way from which RBGE’s Markus Ruhsam collected DToL’s sample. Credit: Markus Ruhsam, RBGE

You wait for ages to publish some reference genomes for plants, then five come at once. This team sequenced Britain and Ireland’s only native wild apple tree (Malus sylvestris) plus four heritage cultivars of Malus domestica originally grown on these shores. This is just part of a deeper apple-based project botanists at Edinburgh and Kew, bioinformaticians at the Wellcome Sanger Institute, and other collaborators have been involved with since DToL began. They’ve also produced short-read DNA sequences to compare over 40 other varieties of locally grown apples.

These genomes can help answer questions about UK’s apple history, the species’ evolution, and how to protect the precariously positioned crab apple. Its genetic integrity is being undermined by hybridisation with widely-planted domestic relatives. Nearly 30% of the wild apple trees surveyed in a recent study in northern Britain turned out to be of hybrid origin.

Learn more about the DToL apple genomes here. And if you fancy mulling over some genomics with a hot cider this winter, check out the short Scider videos researchers made looking at the science of this apple-based beverage.

Mistletoe: Britain and Ireland’s largest genome

Royal Botanical Gardens Kew

Mistletoe (Viscum album) growing on the Wellcome Genome Campus wetlands nature reserve. Credit: Wellcome Sanger Institute

Possibly the species which has taken up the most DToL time in 2022 is the European mistletoe. Viscum album boasts the largest genome in Britain and Ireland, clocking in at 90 Gbp – 30 times the size of the human genome. Why is it so large? Nobody is quite sure, but that’s a question which the reference genome will help answer in future.

Samples from a female mistletoe were collected by the Kew Gardens team in September 2020 in southwest London. The next year, 2021, focused on extracting the plant’s DNA and sequencing its DNA data. In 2022, bioinformaticians assembled the genome along ten colossal chromosomes. A special mention goes to Lucia Campos-Dominguez at the University of Edinburgh, who spent the last three months curating the genome – essentially scrolling through, chromosome by chromosome, checking for errors and inversions. 

Barring some checking of the work and a bit of head-scratching over how to upload this massive genome onto public databases, the team can declare that a reference genome assembly of Britain and Ireland’s biggest genome is now complete.

Read more about the trials and triumphs of assembling the mistletoe genome.

Protist genomes ready and DNA barcoding successes

The Earlham Institute (EI)

People in a field
DToL teams from EI and Kew joined Norfolk Fungus Study Group members during a fungi foray at Wheatfen on the Broads. Credit: Sam Rowe, EI

Three protist Genome Notes are on the horizon thanks to the collaboration between the EI, Culture Collection of Algae and Protozoa (CCAP), the University of Oxford, Marine Biological Association and the Wellcome Sanger Institute. COPO brokered the metadata for the three species, as it has done for all of the DToL samples – now making up a significant proportion of all Earth Biogenome Project standard genomes. A paper on generating annotated genomes from single cells will be published early in 2023.

EI’s public engagement initiative, Barcoding the Broads, has trained more than 120 people in DNA barcoding, with four schools in Norfolk and one school in London now conducting independent experiments. Enabling Connections funds supported a number of projects, from a new DNA barcoding hub at the Bayfordbury Field Station in Hertfordshire to work with Kew Gardens and the Norfolk Fungus Study Group. The latter has provided training and resources for community scientists who have identified more than 40 fungi species.

Coastal collections and marine science explored

The Marine Biological Association (MBA)

Person in a stream
MBA’s Patrick Adkins looks for tuning fork weed (Bifurcaria bifurcata) on Lundy. Credit: MBA

In 2022, the MBA collected 1,951 samples representing 274 species, 172 families, and 91 orders. Six species of protists were cultured and successfully harvested at the MBA lab, and 12 species of algae and six species of animals were barcoded in-house. The MBA hosted several visits by experts to collect species, including the Sanger Institute, Natural History Museum, and the University of Bergen. Team members received advanced taxonomic training on expert courses for both macroalgae and planktonic species that require a higher level of taxonomic skill for correct identification.

DNA barcoding

Short sequences of DNA can be used to identify and discriminate between species. This is analogous to the way conventional barcodes distinguish products in a supermarket.

The team attended the Lundy Island Marine Festival where species were sampled and processed in tandem with other DToL partners. The MBA ran a bioblitz which involved both sampling of species from the intertidal zone and educational talks to engage schoolchildren in the project, the wider field of DNA, and the concept of scientific ethics.

To find out more about sequencing our seas, watch this video.

Museum to mountains: epic arthropod collecting

The Natural History Museum (NHM)

Person with a net next to a river
The intense focus of NHM’s Ben Price collecting dragonflies in Thursley, Surrey. Credit: Luke Lythgoe, Wellcome Sanger Institute

2022 was an incredibly successful year of sampling for the NHM DToL team, making up for the 18 lost COVID months. Although the team had sporadic collecting trips in the first few months of the year, the sampling season truly kicked off in late June – prime ‘beetle season’ – when the NHM crew teamed up with entomologists from Natural England for a Bioblitz in the Norfolk Broads. With the help of UK Barcode of Life, the team collected over 550 specimens destined for whole genome sequencing and DNA barcoding.

The DToL team then travelled back to Norfolk for the Dipterists’ Forum field meeting in July. There, 561 specimens were identified and frozen, of which 219 were Diptera – commonly known as flies. A total of 288 species were new to DToL and 119 specimens to UKBoL.

The sampling season finished off with another summer trip in late August to Beinn Eighe, in the Scottish Highlands. Not only was the NHM team accompanied by members of the National Museums of Scotland, the Highland Biological Recording Group, and NatureScot, a Channel 5 film crew was also present. The crew filmed one day of sampling, attempting to capture the frantic behind-the-scenes of DToL entomological work. Despite the midges and the indecisive weather, there were cracks of sun, and team DToL were able to collect 160 species of invertebrates, capping off an incredibly fruitful year of collecting.

A special mention goes to all the external submitters and museum curators who helped with DToL, either by submitting specimens, identifying species, or finalising barcode interpretations.

Wytham’s sunny spring and primary school projects

University of Oxford

Person with a net next to a tree
Liam Crowley finds a lucrative spot for collecting flying insects by a hawthorn. Credit: Luke Lythgoe, Wellcome Sanger Institute

A warm and sunny spring meant that collecting species for genome sequencing at Wytham Woods got off to a rapid start in 2022, although the continued dry weather proved tough for many insects as the hot summer progressed. This included collecting, processing, mounting specimens, and analysing the data, as well as the BugBlitz project, which engaged hundreds of primary school children.

This year also saw the establishment of a local volunteer collectors group, involving people with a range of experience – from enthusiastic students to national experts.

Among the 750 species collected for genome sequencing in 2022, highlights include:

Moving beyond Wytham, the team spread their metaphorical net wide to explore satellite sites including a local reed bed site, Withymead, and coordinated with Gloucestershire Wildlife Trust and Royal Entomological Society to collect the rare Large Blue butterfly, a species for which genome data is keenly sought for conservation applications.

DToL goes to the Royal Society

People in Darwin Tree of Life t-shirts
DToL exhibitors at the Royal Society, from left to right: Ilia Leitch (Kew), Gavin Broad, Laura Sivess (NHM), Harriet Johnson (Wellcome Sanger Institute), Patrick Adkins (MBA), Jack Monaghan, Leia Zhao (Wellcome Sanger Institute). Credit: Luke Lythgoe, Wellcome Sanger Institute

Public engagement with DToL reached new heights in 2022, with everyone from school children to citizen science groups getting involved, from the Lizard in Cornwall to the Isle of Sky in the Hebrides. Possibly the best part was being invited to exhibit at the Royal Society’s Summer Science Exhibition, a five-day centrepiece of the public science calendar. We think the stats speak for themselves.

The verdict? The DToL exhibit was one of the top three most popular on the Royal Society website at the end of the event.

The original version of this article was published on the DToL blog where you can find the latest news and updates from the project.

Find out more about Planetary Biology research at EMBL.

About the Darwin Tree of Life Project

The Darwin Tree of Life Project is an ambitious programme to sequence, assemble, and openly publish the genomes of over 70,000 species of animals, plants, fungi, and protists in Britain and Ireland. The Project contributes to the global mission to sequence all life – the Earth Biogenome Project. The genomic data generated will revolutionise bioscience forever, facilitating research into evolution and biology, conservation of biodiversity, and the development of new biomaterials and pharmaceuticals. The Darwin Tree of Life Project is being undertaken by a consortium of ten Partners: the Earlham Institute, EMBL’s European Bioinformatics Institute (EMBL-EBI), Marine Biological Association, Natural History Museum, Royal Botanical Garden Edinburgh, Royal Botanical Gardens Kew, University of Cambridge, University of Oxford, University of Edinburgh and the Wellcome Sanger Institute.

Tags: biodiversity, bioinformatics, Darwin Tree of Life, darwin tree of life project, embl-ebi, ensembl, genomics, planetary biology, public engagement, sustainability


Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.

EMBLetc. archive

Newsletter archive

Read past editions of our e-newsletter

For press

Contact the Press Office