Thorough characterisation of structural variants in human genomes
The genetic differences between individuals contribute to our individuality. These differences include millions of single nucleotide variants – where, for example, one person’s genetic code may have a letter A, while another may have a C at a given position. There are also hundreds of thousands of structural variants (SVs). SVs include segments of DNA that are inserted into or deleted from the genome, segments that are duplicated, and segments that are inverted. SVs are more difficult to identify than single nucleotide variants, so it has been unclear just how many SVs really exist in a given human genome.
A large team of researchers from the Human Genome Structural Variation Consortium (HGSVC), led by co-first authors Mark Chaisson, Ashley Sanders (currently at EMBL), and Xuefang Zhao, and including Jan Korbel, Paul Flicek, and colleagues at EMBL, used a full suite of genomic technologies to extensively analyse the genomes of three family trios (parents and child). The technologies used include long-read, short-read, and strand-specific sequencing (Strand-seq) technologies, optical mapping and multiple computer algorithms for SV detection. The results present the most comprehensive catalogue of SVs to date, showing SVs in the children’s genomes and including information about which set of parental chromosomes each SV was present on.
The team found that more than 100,000 variants per individual are actually missed by routine sequencing technologies and commonly-used computer algorithms. These missed variants included 350 large inversions, amounting to an average of 24 megabases (million bases) of inverted DNA per individual genome, which were uncovered using Strand-seq technology in the analyses lead by EMBL’s Ashley Sanders and Jan Korbel. The true number of SVs in a given human genome appears to be three to seven times greater than most studies typically identify. SVs therefore constitute a large amount of genetic variation not commonly captured by current genome sequencing technologies and analytical methods. This implies that the contribution of SVs to human disease has not yet been well quantified. Expanding the use of multiple technologies for SV detection can provide new genetic associations to diseases and can increase sensitivity and improve diagnostic yields in genetic testing.