Darwin Tree of Life: first 500 genome assemblies released as project creates a buzz
We look back through some of the 2022 highlights from the Darwin Tree of Life project
As of December 2022, the Darwin Tree of Life (DToL) project – an ambitious project to sequence all species in the UK and Ireland – has released 500 reference-quality genome assemblies to public databases, ready to be used by researchers around the globe. Researchers at EMBL-EBI are supporting the DToL project by storing and annotating the genomes sequenced, and making these data openly available through Ensembl Rapid Release and the DToL Data Portal.
The successes of this last year have been down to the tireless DToL team – whether in the field or the lab, on computers or taking the DToL science on the road. Below is a selection of highlights chosen by some DToL partners.
Annotations, data portal features and geocaching
EMBL’s European Bioinformatics Institute (EMBL-EBI)
In 2022, the EMBL-EBI DToL team reached the first 100 annotated genomes for new species. This number has steadily increased and these new genome annotations are openly available through the Ensembl DToL page and Ensembl Rapid Release.
The team also made some exciting updates to the DToL Data Portal – an open access platform managed by EMBL-EBI, which pulls together data from across the DToL project making it available all in one place. Users can track the sequencing progress of their species of interest. This feature was updated to include detailed status updates to monitor samples at each step of the process. They also added an interactive sampling map that allows users to identify where species samples have been collected.
On the public engagement side of things, the team launched a new multilingual activity to engage migrant communities with the science underpinning nature in local areas. This activity – called ‘Into Nature’ – combines geocaching with collectable cards that participants can track down to learn more about DToL species.
First 500 assemblies
The Wellcome Sanger Institute
For everyone at the Wellcome Sanger Institute’s Tree of Life Programme, 2022 will be looked back on as the year its genome production pipeline really powered into action. This initial stage of the DToL project has given proof of concept for this ambitious genomics venture.
Could the team create a series of scientific processes that take an organism in the wild and transform it into a top-quality, chromosomal-level genome assembly representing its entire species? Could those processes be repeated again and again for thousands of species spanning every branch of the tree of life? Could a network of partner organisations focused on ecology, informatics, and analysis be brought together to achieve this goal?
The answer is a resounding yes. In the last 12 months, the team has more than doubled the number of DToL genome assemblies on public databases and tripled its catalogue of published Genome Notes.
The graphic above shows the diversity of the project’s first 500 assemblies across the tree of life. There has undoubtedly been a bias towards arthropods, for a number of reasons. They are easy to collect and sequence – for example, moths fly towards light, arthropod DNA is relatively easy to extract in the lab, and many have smaller genomes. But the team also wanted to showcase the project’s potential for aiding comparative genomics, with DToL scientists already publishing arthropod studies based on these genome assemblies.
There is also a significant breadth of diversity emerging. In 2022, the team published Genome Notes for the first fungi, cnidarians, tunicates, molluscs, and most recently, plants. The first protists are due to be published soon. Getting to grips with this dazzling array of organisms and their genomes is a key achievement in a very successful year.
The first plant genomes are published
Royal Botanical Garden Edinburgh (RBGE)
You wait for ages to publish some reference genomes for plants, then five come at once. This team sequenced Britain and Ireland’s only native wild apple tree (Malus sylvestris) plus four heritage cultivars of Malus domestica originally grown on these shores. This is just part of a deeper apple-based project botanists at Edinburgh and Kew, bioinformaticians at the Wellcome Sanger Institute, and other collaborators have been involved with since DToL began. They’ve also produced short-read DNA sequences to compare over 40 other varieties of locally grown apples.
These genomes can help answer questions about UK’s apple history, the species’ evolution, and how to protect the precariously positioned crab apple. Its genetic integrity is being undermined by hybridisation with widely-planted domestic relatives. Nearly 30% of the wild apple trees surveyed in a recent study in northern Britain turned out to be of hybrid origin.
Possibly the species which has taken up the most DToL time in 2022 is the European mistletoe. Viscum album boasts the largest genome in Britain and Ireland, clocking in at 90 Gbp – 30 times the size of the human genome. Why is it so large? Nobody is quite sure, but that’s a question which the reference genome will help answer in future.
Samples from a female mistletoe were collected by the Kew Gardens team in September 2020 in southwest London. The next year, 2021, focused on extracting the plant’s DNA and sequencing its DNA data. In 2022, bioinformaticians assembled the genome along ten colossal chromosomes. A special mention goes to Lucia Campos-Dominguez at the University of Edinburgh, who spent the last three months curating the genome – essentially scrolling through, chromosome by chromosome, checking for errors and inversions.
Barring some checking of the work and a bit of head-scratching over how to upload this massive genome onto public databases, the team can declare that a reference genome assembly of Britain and Ireland’s biggest genome is now complete.
Three protist Genome Notes are on the horizon thanks to the collaboration between the EI, Culture Collection of Algae and Protozoa (CCAP), the University of Oxford, Marine Biological Association and the Wellcome Sanger Institute. COPO brokered the metadata for the three species, as it has done for all of the DToL samples – now making up a significant proportion of all Earth Biogenome Project standard genomes. A paper on generating annotated genomes from single cells will be published early in 2023.
EI’s public engagement initiative, Barcoding the Broads, has trained more than 120 people in DNA barcoding, with four schools in Norfolk and one school in London now conducting independent experiments. Enabling Connections funds supported a number of projects, from a new DNA barcoding hub at the Bayfordbury Field Station in Hertfordshire to work with Kew Gardens and the Norfolk Fungus Study Group. The latter has provided training and resources for community scientists who have identified more than 40 fungi species.
Coastal collections and marine science explored
The Marine Biological Association (MBA)
In 2022, the MBA collected 1,951 samples representing 274 species, 172 families, and 91 orders. Six species of protists were cultured and successfully harvested at the MBA lab, and 12 species of algae and six species of animals were barcoded in-house. The MBA hosted several visits by experts to collect species, including the Sanger Institute, Natural History Museum, and the University of Bergen. Team members received advanced taxonomic training on expert courses for both macroalgae and planktonic species that require a higher level of taxonomic skill for correct identification.
Short sequences of DNA can be used to identify and discriminate between species. This is analogous to the way conventional barcodes distinguish products in a supermarket.
The team attended the Lundy Island Marine Festival where species were sampled and processed in tandem with other DToL partners. The MBA ran a bioblitz which involved both sampling of species from the intertidal zone and educational talks to engage schoolchildren in the project, the wider field of DNA, and the concept of scientific ethics.
2022 was an incredibly successful year of sampling for the NHM DToL team, making up for the 18 lost COVID months. Although the team had sporadic collecting trips in the first few months of the year, the sampling season truly kicked off in late June – prime ‘beetle season’ – when the NHM crew teamed up with entomologists from Natural England for a Bioblitz in the Norfolk Broads. With the help of UK Barcode of Life, the team collected over 550 specimens destined for whole genome sequencing and DNA barcoding.
The DToL team then travelled back to Norfolk for the Dipterists’ Forum field meeting in July. There, 561 specimens were identified and frozen, of which 219 were Diptera – commonly known as flies. A total of 288 species were new to DToL and 119 specimens to UKBoL.
The sampling season finished off with another summer trip in late August to Beinn Eighe, in the Scottish Highlands. Not only was the NHM team accompanied by members of the National Museums of Scotland, the Highland Biological Recording Group, and NatureScot, a Channel 5 film crew was also present. The crew filmed one day of sampling, attempting to capture the frantic behind-the-scenes of DToL entomological work. Despite the midges and the indecisive weather, there were cracks of sun, and team DToL were able to collect 160 species of invertebrates, capping off an incredibly fruitful year of collecting.
A special mention goes to all the external submitters and museum curators who helped with DToL, either by submitting specimens, identifying species, or finalising barcode interpretations.
Wytham’s sunny spring and primary school projects
University of Oxford
A warm and sunny spring meant that collecting species for genome sequencing at Wytham Woods got off to a rapid start in 2022, although the continued dry weather proved tough for many insects as the hot summer progressed. This included collecting, processing, mounting specimens, and analysing the data, as well as the BugBlitz project, which engaged hundreds of primary school children.
This year also saw the establishment of a local volunteer collectors group, involving people with a range of experience – from enthusiastic students to national experts.
Among the 750 species collected for genome sequencing in 2022, highlights include:
the rare Brown Spruce Longhorn Beetle found by a school pupil
the first Emperor moth from Wytham, a Rugged Oil Beetle and a Sabre Wasp
The Darwin Tree of Life Project is an ambitious programme to sequence, assemble, and openly publish the genomes of over 70,000 species of animals, plants, fungi, and protists in Britain and Ireland. The Project contributes to the global mission to sequence all life – the Earth Biogenome Project. The genomic data generated will revolutionise bioscience forever, facilitating research into evolution and biology, conservation of biodiversity, and the development of new biomaterials and pharmaceuticals.The Darwin Tree of Life Project is being undertaken by a consortium of ten Partners: the Earlham Institute, EMBL’s European Bioinformatics Institute (EMBL-EBI), Marine Biological Association, Natural History Museum, Royal Botanical Garden Edinburgh, Royal Botanical Gardens Kew, University of Cambridge, University of Oxford, University of Edinburgh and the Wellcome Sanger Institute.