Scientists use the Pan-Cancer dataset to study how structural variations in the genome can lead to cancer
Using the dataset from the Pan-Cancer project, a team including EMBL scientists has developed methods to group, classify, and describe structural variants – large rearrangements of the genome that are a key driver of cancer. Their findings could help to improve cancer diagnosis and therapy.
The Pan-Cancer project
The Pan-Cancer Analysis of Whole Genomes project is a collaboration involving more than 1300 scientists and clinicians from 37 countries. It involved analysis of more than 2600 genomes of 38 different tumour types, creating a huge resource of primary cancer genomes. This was the starting point for 16 working groups to study multiple aspects of cancer development, causation, progression, and classification.
Structural variations in genomes can arise from deleting, amplifying, or reordering genomic segments ranging from a few thousand letters of the genetic code to whole chromosomes. These variations have previously been difficult to classify and catalogue due to the complex mechanisms of their formation. Jan Korbel, group leader at EMBL and one of the initiators and coordinators of the Pan-Cancer project, explains: “In this study we have uncovered and classified different ways by which the cancer genome can rearrange. We performed the first detailed classification of structure variation mechanisms in cancer genomes.” The researchers uncovered several new processes that can lead to cancer, for instance a complex process in which some fractions of the genome are duplicated more than once. This can lead to cancer genes becoming active, because they are copied in high number and then brought to a region in which they can be switched on.
Linking events to mutations
“Most previous studies have been conducted on the coding 1–2% of the genome,” says Joachim Weischenfeldt, group leader in the Biotech Research & Innovation Centre in Copenhagen and a former postdoc in EMBL’s Korbel group. “Structural variations have been largely ignored, because most of them are situated in the non-coding part of the genome and are much more complex to comprehend.”
Along with colleagues Weischenfeldt developed methods to identify structural variants and the mechanisms of their formation by performing a whole genome sequencing analysis on the Pan-Cancer data. “This paper is one of the first ones to systematically classify very complex types of structural variants that occur in cancer genomes and link them to mechanisms of formation,” says Weischenfeldt. “It now gives us a handle to distinguish the different types of structural variants that occur in cancer genomes. We can potentially use these as biomarkers in different cancers, because mutations in certain very potent driver genes give rise to specific types of structural variants.” Biomarkers are biological indicators, such as specific molecules or genetic sequences, that can be used to identify certain conditions – in this case cancer.
The paper moves researchers closer to answering some of the basic questions about cancer, as they can now explore the genetics behind genome rearrangements. Complex events can be linked back to a specific mutation, which is very important for better diagnosis of patients. The catalogue can be used both for prognosis and therapy.
A new therapeutic tool
Weischenfeldt explains that this sort of analysis is already being implemented as a clinical tool to identify mutation signatures; combinations of mutations with a characteristic pattern. During his current work at the Biotech Research & Innovation Centre in Copenhagen, Weischenfeldt is applying these analysis methods to understand the genetics of cancer patients and to devise better and more tailored treatment options. “We have a programme in which patients can get precision medicine based on genomic findings. The classification method that will be published in this paper will be an important part of our toolbox,” he says.
Weischenfeldt explains that working on the Pan-Cancer project has trained him and many of his colleagues in handling and analysing large and complex datasets to identify recurrent and biologically relevant patterns. “There weren’t a lot of these methods when we started. We had to come up with new, reproducible research methods to analyse the genetic information we had,” he says. “That was a huge challenge, but also an extremely exciting one. That’s why we do research.”
Korbel adds that following up on the research presented in this paper will also be very interesting to EMBL scientists. “For us, a very important next step is to identify the molecular cause of all these separate processes,” says Korbel.