Cancer is a genetic disease caused by changes – or mutations – in a person’s DNA. These mutations can arise in a vast number of ways, which means that each patient’s cancer is different and has a unique constellation of mutations. DNA sequencing is an essential tool for cancer diagnosis and treatment because it offers a snapshot of these mutations.

Researchers at EMBL-EBI are using a new technology, called long-read genome sequencing, to better understand cancer and to support healthcare professionals with cancer diagnosis, monitoring, and treatment. The journal Nature Methods chose long-read sequencing as its method of the year in 2022.

Understanding relapse in childhood cancers

Cancer in children is relatively uncommon compared to adult cancer, but because it’s rare, diverse, and biologically different, its risk factors and relapse mechanisms are poorly understood. Each year, in the UK alone, there are approximately 1,800 new cases of childhood cancer. Although the survival rate is high, almost half of paediatric patients relapse. At the moment, oncologists struggle to predict which patients will become ill again. 

This is one of the challenges that Isidro Cortes-Ciriano and his research group at EMBL-EBI are working on, with funding support from Cancer Research UK (CRUK) and the Stratified Medicine Paediatrics programme, co-led with Louis Chesler from the Institute of Cancer Research and Darren Hargrave  from Great Ormond Street Hospital (GOSH). 

Using nanopore sequencing – a type of long-read sequencing technology – Cortes-Ciriano is exploring whether a simple blood test could help to anticipate the emergence of cancer relapse in children. 

As part of the project, blood samples are collected from paediatric cancer patients when they are first diagnosed. The patients are regularly monitored, including through blood tests, in the years following diagnosis. The samples are sequenced at the laboratory of Andrew Beggs at Birmingham University, and the Cortes-Ciriano group analyses the data with the aim of identifying biomarkers that make relapse more likely. One of the aims is to explore whether long-read sequencing could be used to detect the emergence or relapse of cancer earlier than other clinical tests, such as imaging. 

Such a test, applied in the clinic, could have the potential to significantly improve the quality of care patients receive, at a relatively low cost for healthcare systems. 

What is long-read sequencing? 

Long-read sequencing, also referred to as third generation sequencing, is a technology that can read DNA sequences of up to 100,000 base pairs in one go. This makes it easier to assemble the reads into a complete DNA sequence, similarly to how a puzzle with fewer, larger pieces is easier to put together.

There are both advantages and disadvantages to this new technology. For example, the sample preparation is a lot more straightforward for long-read sequencing, and the turnaround time is faster than second generation sequencing methods. Both of these features are highly beneficial in a clinical context. However, long-read sequencing requires samples containing substantially more DNA than other methods – and this is difficult to achieve in some contexts, including for paediatric cancers or tumour biopsies. In addition, long-read sequencing also produces a higher error rate than short-reads, which means the results can be more difficult to analyse.

Long-read sequencing is better at identifying certain types of genomic variation, such as where large sections of DNA are inserted, deleted, or moved around. On the other hand, technologies that use shorter reads are better at finding small variations in DNA such as changes to a single letter of the genetic code.

With robust sampling and analysis, long-read sequencing could be extremely powerful in clinical contexts, for example for the detection and understanding of complex genome rearrangements and structural variations. This is particularly useful in cancer genomics, especially for sarcomas, oesophageal cancer, and ovarian cancer.

Lynch Syndrome – an at-risk population

People suffering from Lynch Syndrome, which is a rare hereditary condition, display an increased risk of developing cancer. Individuals who have this syndrome are often closely monitored and regularly screened for cancer, frequently using invasive methods.

Cortes-Ciriano and his group, in collaboration with Andrew Beggs from the University of Birmingham, are following 300 individuals with Lynch Syndrome over the course of five years. When an individual develops cancer, a tumour sample is sent to the University of Birmingham for long-read sequencing. Similarly to the paediatric cancer project mentioned above, Cortes-Ciriano and his group are trying to find out whether invasive tests, such as colonoscopies, could be replaced with blood tests. They are also trying to see if blood tests can be used to detect cancer earlier than other methods, such as imaging. 

“Nanopore sequencing has the potential to tell you where cancer is in the body in a quick and relatively cheap way through the analysis of methylation patterns in cell-free DNA,” said Cortes-Ciriano. “This could be a real game changer in cancer identification and management.”

What is methylation?

Methylation is a chemical modification of DNA and other molecules that may be retained as cells divide to make more cells. When found in DNA, methylation can alter gene expression. In this process, chemical tags called methyl groups attach to a particular location within DNA where they turn a gene on or off, thereby regulating the production of proteins that the gene encodes. Source: Genome.gov

Improving long-read sequencing data analysis

“For both these projects, we are using Oxford Nanopore sequencing because it’s a method that allows us to simultaneously read out DNA sequence and methylation, even in short DNA fragments like those released by cancer cells into the bloodstream” explained Cortes-Ciriano. “The methylation in a plasma sample is a very powerful tool for early detection of cancer. When it comes to tumour sample analysis, nanopore sequencing can read very long DNA molecules, and allows us to reconstruct alterations of the genome in cancer cells much better than other methods.”

“For me, the most exciting advantage of long-read sequencing is the different types of data it makes available from the same assay,” added Carolin Sauer, Postdoctoral Fellow in the Cortes-Ciriano group. “This multi-omic approach could really improve sensitivity and specificity for early cancer detection tests. It could even support more accurate diagnosis and disease classification, which could make a difference in the clinic.”  

In parallel, the team is also developing software tools to address some of the main challenges of long-read sequencing technology: dealing with longer reads than usual and the fact that the signal from the sequencer is noisier.

For example, Hillary Elrick, a Predoctoral Fellow in the Cortes-Ciriano group, has developed a tool called SAVANA to detect the structural alterations of the cancer genome, which could help overcome at least some of the read errors of the technology. 

“Long-reads have a lot of potential to detect structural variants. Since they span thousands of base pairs, long-reads can identify mutations in regions where short-reads struggle. By developing a tool to use these long-reads, we hope to identify additional structural variants that contribute to cancer development or progression,” said Elrick.

“There are still challenges to implementing long-read sequencing in the clinic,” explained Cortes-Ciriano. “We need standardised approaches and reliable tools to analyse the data, and this is exactly the kind of work we focus on in our group, and more widely at EMBL-EBI.”

What next?

“Thanks to funding from Cancer Research UK and several charities, we are able to explore the potential of long-read genome sequencing technologies for a range of different cancers,” said Cortes-Ciriano. “We’re only at the beginning but we are excited to see what the future holds and we are keen to develop collaborations with other labs working in this sphere.”

Alongside the three projects outlined above, the Cortes-Ciriano group is also working in collaboration with Genomics England and clinicians in the UK, such as Adrienne Flanagan at UCL, to perform and analyse long-read sequencing of hundreds of sarcomas, brain tumours, ovarian cancer samples, and leukaemias. They are hoping to reveal additional insights into these cancer types, while also further exploring the potential of long-read sequencing in the field. With this aim, the group is contributing expertise and computational tools to help analyse the data generated in collaboration with Genomics England. 

Acknowledgements

This work would not be possible without the close collaboration with a range of clinicians, pathologists and technicians across the UK. EMBL-EBI would like to thank Adrienne Flanagan and her team at University College London (UCL), and Royal National Orthopaedic Hospital (RNOH), especially Katherine Trevers. We are also grateful to Melanie Tanguy and Greg Elgar at Genomics England, who are in charge of the genome sequencing for these projects, as well as clinicians Richard Mair and James Brenton at the University of Cambridge. The paediatric cancer project is a collaboration with a wide range of individuals from the Institute for Cancer Research in London, with special thanks to Michael Hubank, John Anderson, Louis Chesler, and the University of Birmingham, with special thanks to the Andrew Beggs lab.

The long-read sequencing work performed by the Cortes-Ciriano group is made possible thanks to funding from Genomics England, CRUK, Sarcoma Foundation of America, NF Research Initiative, and Connective Tissue Oncology Society.

Edit