SPRTA: a smarter way to measure evolution uncertainty
A new method from EMBL-EBI and collaborators offers fast, easy-to-interpret confidence scores for phylogenetic trees for pandemic preparedness
SPRTA: a smarter way to measure evolution uncertainty. Image credit: Karen Arnott/EMBL-EBI
Summary
Researchers at EMBL-EBI and collaborators have developed SPRTA, a new way to measure confidence in evolutionary family trees at a pandemic scale.
SPRTA allows scientists to quickly identify which parts of these trees are reliable, where uncertainty remains, and what could be alternative evolutionary histories.
By helping to track how pathogens spread and evolve, SPRTA could improve responses to future pandemics.
When COVID-19 arrived, researchers tried to build evolutionary family trees – known as phylogenetic trees – of the virus. These help scientists understand when new virus strains appear and how they are linked to each other. But with millions of genomes to analyse, checking how reliable those trees were proved impossible.
To address this gap, researchers at EMBL’s European Bioinformatics Institute (EMBL-EBI) and colleagues at the Australian National University have developed SPRTA (SPR-based Tree Assessment), an interpretable and efficient way to score the reliability of each branch in a phylogenetic tree. SPRTA is the first such tool that is scalable to pandemic-sized datasets.
What are phylogenetic trees?
Phylogenetic trees are like family trees for genomes. They show how viruses, bacteria, animals, or any other organisms are related, based on their DNA or RNA sequences. Each branch point or node represents a potential common ancestor.By comparing genetic differences, scientists can reconstruct the evolutionary history of a species or pathogen, identifying where it came from, how it has changed, and how it has spread.During the COVID-19 pandemic, phylogenetic trees helped researchers track new variants, understand how they moved between countries, and predict potential future changes to the virus.
Re-inventing phylogenetic assessment
Since 1985, scientists have relied on a method called Felsenstein’s bootstrap to measure confidence in phylogenetic trees. But because this method works by repeating the same analysis hundreds or even thousands of times, it becomes too slow to handle the millions of viral genomes sequenced during a pandemic.
A recent paper, published in the journal Nature, introduces SPRTA, a modern, scalable alternative capable of handling the huge datasets generated during large disease outbreaks. SPRTA enables researchers to track how pathogens spread and evolve reliably and rapidly, informing better decisions during outbreaks and supporting pandemic preparedness.
“For nearly 40 years, scientists have relied on the same method to measure confidence in evolutionary trees, but when faced with the scale of data we saw during the COVID-19 pandemic, the old method simply couldn’t cope,” said Nick Goldman, Group Leader at EMBL-EBI. “SPRTA gives us a fast, reliable way to understand which parts of these massive trees we can trust and to find the most plausible alternatives in regions of low confidence. This is exactly the kind of tool we’ll need to respond faster and smarter in the next pandemic.”
A smarter way to measure confidence
Traditional methods, such as Felsenstein’s bootstrap, focus on whether groups of samples, known as clades, are strongly supported by the data collected. But for outbreak analysis, that’s not always enough. SPRTA takes a different approach. It analyses how likely it is that a virus strain descends from a particular ancestor, and which alternative evolutionary paths are possible.
To do this, SPRTA tests many possible scenarios by virtually rearranging branches of the phylogenetic tree and comparing how well each one fits the data. It then assigns a simple probability score showing how confident researchers can be in each connection.
“With SPRTA, we’re not just making phylogenetic tree-building faster, we’re making it smarter,” said Nicola De Maio, Staff Scientist at EMBL-EBI. “It helps researchers understand which relationships are solid and where they need to be cautious, even when working with millions of genomes.”
Designed for pandemic-scale data
Using more than two million SARS-CoV-2 genomes, the researchers demonstrated that SPRTA can:
highlight which parts of a phylogenetic tree are highly reliable
flag uncertain sample placements, often due to incomplete or noisy data
reveal credible alternative origins for specific branches
Integrating SPRTA into these established tools makes the method open, accessible, and ready for researchers worldwide to apply in outbreak tracking, genomic surveillance, and evolutionary studies.
Funding
This work was supported by EMBL core funds and the Medical Research Council (MRC). Australian collaborators received support from the Chan-Zuckerberg Initiative.