Analysis of whole cancer genomes gives key insights into the role of the non-coding genome in cancer
The discovery of genetic drivers of cancer can have critical implications for the diagnosis and treatment of cancer patients, yet genome analysis has focused primarily on only 1–2% of the whole genome – the part that contains the code for making proteins. What about the rest? Does it also play a role in driving the disease?
As part of the Pan-Cancer project, scientists have analysed whole-genome sequencing data. To do this, the scientists had to develop new statistical methods suitable for analysing the non-coding genome.
The Pan-Cancer project
The Pan-Cancer Analysis of Whole Genomes project is a collaboration involving more than 1300 scientists and clinicians from 37 countries. It involved analysis of more than 2600 genomes of 38 different tumour types, creating a huge resource of primary cancer genomes. This was the starting point for 16 working groups to study multiple aspects of cancer development, causation, progression, and classification.
Joachim Weischenfeldt – now a group leader at the Biotech Research & Innovation Centre at the University of Copenhagen, and Rigshospitalet, Copenhagen – was a postdoc in the Genome Biology Unit at EMBL Heidelberg at the time of the research. He explains the rationale for the investigation: “Decades of work has been focused on identifying the consequences of changes in the protein-coding part of the genome. Many cancers have no important mutations in the protein-coding part, but something is driving the cancer. By inference, we suspect the non-coding part is playing an important role in these unexplained cases.”
The analysis focused on identifying driver point mutations – mutations that affect only one or very few letters of the DNA code – and structural variants, or rearrangements, in the non-coding regions of the genome. In addition to identifying new drivers, the analysis confirmed some previously reported drivers and, importantly, invalidated others. It also identified novel putative driver rearrangements near genes called the AKR1C genes. This correlated with increased gene expression across lung and liver cancers.
Mutations and structural variants driving cancer were found to be less frequent in non-coding genes and sequences than in the protein-coding part of the genome, but this could partly be due to the relatively small number of patient datasets available to analyse for some tumour types. “We probably need an order of magnitude more genomes to really have a comprehensive understanding of all the mutations that drive cancer, and the complex mechanisms by which they form,” says Weischenfeldt. “As cancer is a disease of the genome, we ultimately want to be able to explain as many cancers as possible using genetics.”
Thanks in large part to the work carried out during the Pan-Cancer project, about 95% of the cancers studied could be explained genetically by a driver mutation. One of the key outputs of the project is a catalogue that clinicians and researchers can use to look up specific tumour types and identify the drivers of the disease.