First combined map of genetic variation at different scales
In a nutshell:
1st map combining human genetic variation at different scales – from single letters to large chunks
Based on genomes of 1092 healthy people from Europe, the Americas and East Asia
Could help identify genetic causes of disease, rather than just links
Data made freely available in formats and tools that make it useful for studies ranging from biomedical research to human evolution
The 1000 Genomes Project today presents a map of normal human genetic variation – everything from tiny changes in the genetic code to major alterations in our chromosomes. In a DNA version of ‘spot-the-difference’, EMBL scientists and their colleagues studied the genomes of 1092 healthy people from Europe, the Americas and East Asia, systematically tracking what makes us different from each other. Their results, published in Nature, open new approaches for research on the genetic causes of disease.
“The 1000 Genomes Project has achieved something truly exceptional in providing this powerful baseline of human variation,” said Paul Flicek of EMBL-EBI, who co-chairs the project’s Data Coordination Centre (DCC). As well as providing that baseline – a clearer picture of which DNA sequences are common and which are rare in people from different areas or ethnic backgrounds – the results could help the ever-ongoing search for genetic links to diseases.
Jan Korbel from EMBL Heidelberg, who co-leads the project’s study of variation in large sections of chromosomes, pointed out the advantages of combining information on such large-scale variations with data on changes at a smaller scale. “This integrated view of genome variation will be extremely useful for understanding cause and consequence, and hence provide an invaluable context for future medical studies,” Korbel said. “When people find a SNP, a single letter change, that’s associated with a disease, they can now see if there’s a change in a larger chunk of the genome that’s always inherited alongside that SNP, and could cause the disease.”
The results also open up new avenues for researchers interested in how different genetic sequences have spread across human populations – taken by European settlers to the Americas, for instance. Ensuring that the project’s results are useful to researchers working in a wide range of fields is the mission of Flicek’s data coordination team. “Like ENCODE and other massive datasets, it is crucial that people working in all areas of human health and biomedical research can make the most of it. Our role has been to make these data not just freely available but truly accessible.”
To that end, the scientists have already made the current results available to the scientific community. “The results of this first phase are in the 1000 Genomes browser, which has a whole suite of Ensembl-based tools that help you make practical use of the data,” said Laura Clarke of EMBL-EBI, Technical Lead for the DCC. “For example it lets you look at shared patterns of variance, which can be a good indicator of whether a particular genetic factor is related to disease. Another very practical tool lets you take just a slice of the data, so you don’t have to download the whole massive dataset.”
With the help of such tools, and the continuation of the 1000 Genomes Project, scientists are set to keep learning about, and from, the differences between us.
Through characterising the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help understand the genetic contribution to disease. We describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methodologies to integrate information across multiple algorithms and diverse data sources we provide a validated haplotype map of 38 million SNPs, 1.4 million indels and over 14 thousand larger deletions. We show that individuals from different populations carry different profiles of rare and common variants and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways and that each individual harbours hundreds of rare non-coding variants at conserved sites, such as transcription-factor-motif disrupting changes. This resource, which captures up to 98% of accessible SNPs at a frequency of 1% in populations of medical genetics focus, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.