Five things you probably didn’t know about the Human Genome Project
Today we celebrate the 20th anniversary of the first draft sequence of the human genome
On 26 June 2000, the UK and US governments announced simultaneously the completion of the first draft of the entire human genome – the first map of the three billion base pairs that make up human DNA.
The international project, led by scientists from across the world, aimed to provide high-quality sequencing data to help the scientific community understand human genetics, health, and disease. The project involved thousands of scientists, including many who worked on the Wellcome Genome Campus, where EMBL’s European Bioinformatics Institute (EMBL-EBI) is located.
The final sequence was declared complete in 2003: the culmination of more than 10 years of effort. The scientists behind the Human Genome Project created a deeply valuable resource, which others could use to perform research that would otherwise have been inconceivable.
Twenty years after the first draft of the human genome, we explore five little-known facts about the project.
1. Public vs private research: stronger together
Sequencing the human genome started out as a public project, funded by a variety of institutes and organisations across the globe. In 1998, a private company called Celera Genomics, headed by American biochemist Craig Venter, began to compete with the public project to get a full sequence first. At the heart of the race was the prospect of gaining control over potential patents on the genome sequence, a real pharmaceutical treasure trove. By the end of the project, public and private stakeholders decided to team up to provide a high-quality sequence freely available to all.
2. The birth of the Wellcome Sanger Institute and EMBL-EBI
One of the early challenges of the Human Genome Project was to find adequate facilities to host the project’s sequencers and personnel in the UK. The geneticists John Sulston and Bob Waterston collaborated to secure funding from the Wellcome Trust and the UK Medical Research Council (MRC), both of which agreed to support a new centre that would start sequencing the human genome. Sulston set out to prospect the English countryside to find a suitable site for a new sequencing facility and found a hidden gem – an abandoned scientific site at Hinxton Hall, on the outskirts of Cambridge. That’s how the Wellcome Sanger Institute (then called the Sanger Centre) was born in the summer of 1992, paving the way for other institutions like EMBL-EBI to establish themselves on what is now the Wellcome Genome Campus.
At EMBL Heidelberg, Graham Cameron had developed the concept for EMBL-EBI and was responsible for the final proposal, which was accepted by EMBL Council in 1992. Insect geneticist Michael Ashburner, later Joint Director of EMBL-EBI with Graham Cameron, had heard of John Sulston’s Sanger Centre at Hinxton Hall. Seeing it as a perfect opportunity to combine EMBL’s bioinformatics resources with a nascent sequencing institute, he and John Sulston convinced the Wellcome Trust and the MRC that the Wellcome Genome Campus was an ideal location for EMBL’s new site. Several countries put bids in to host EMBL-EBI, but the Wellcome Trust and the MRC made a strong enough case that the UK was the winner. Since then, EMBL-EBI has played a major part in the bioinformatics revolution and collaborates closely with the Wellcome Sanger Institute.
3. Public project, public data
The scientists who contributed to the public efforts to sequence the human genome made their data publicly available as soon as they collected them. The Wellcome Sanger Institute ensured that the results of scientific research would be made accessible to all, to accelerate research and increase transparency. A number of sequencing companies took advantage of the data before the end of the project to offer genetic tests that could show predispositions to various illnesses such as breast cancer or cystic fibrosis.
4. Project spinoff: Ensembl
The Human Genome Project led to revolutionary advances in medicine, and also in sequencing technologies, data resources and other large-scale projects. A major example is Ensembl, a genomics information resource founded as a joint project between EMBL-EBI and the Wellcome Sanger Institute, and since 2014 located only at EMBL-EBI. Ensembl was created in response to the progress of the Human Genome Project and was originally used to browse the human genome. Today, Ensembl continues to serve thousands of researchers as it allows scientists to search the genomes of thousands of vertebrates, invertebrates, plants, fungi, bacteria, and protists across both the Ensembl and Ensembl Genomes websites. To this day, the Human Genome Project inspires open science collaborative open science projects across the globe.
5. GeneSweep: Ewan Birney’s betting book
From 2000 to 2003, when the sequencing efforts of the Human Genome Project were reaching their height, Ewan Birney, now Joint Director of EMBL-EBI, organised a bet known as the GeneSweep. Genomics specialists placed their bets from 1 to 20 US dollars and tried to guess the total number of genes in the human genome. Birney neatly recorded more than 460 bets in a book and announced the winner in 2003. Most scientists overestimated the results, thinking the human genome would contain between 50 000 and 100 000 protein-coding genes. The Ensembl project at the time of the bet estimated the number of protein-coding genes at only 24 847, and the current estimate is around 20 000. The winner of the competition was Lee Rowen of the Institute for Systems Biology in Seattle, whose estimate was the lowest at 25 947. Birney’s GeneSweep book, now a piece of scientific history, was part of an exhibition on the genome at London’s Science Museum in 2012. The exhibition was stimulated by the Encyclopedia of DNA Elements (ENCODE) project: a collaborative effort to identify all of the functional elements in the human genome, in which EMBL-EBI played an important role.