EMBL-EBI’s data resources are built on a constantly running compute and storage infrastructure. Over the past decade that infrastructure has grown exponentially, keeping pace with the rapid growth of molecular data and the corresponding need for computation. Terabytes of data flow every day on and off our storage systems, making up the hidden life-blood of data and knowledge that permeates much of modern molecular biology.
There is a somewhat bewildering complexity to all of this. We have 57 key resources: everything from low-level, raw DNA storage (ENA) through genome analysis (Ensembl and Ensembl Genomes), complex knowledge systems (UniProt) and 3D protein structures (PDBe). At minimum, over half a million users visit at least one of the EMBL-EBI websites each month, making 12 million web hits and downloading 35 Terabytes each day. Each resource has its own release cycle, with different international collaborations (for example, INSDC, wwPDB, ProteomeXchange) handling the worldwide data flow.
To achieve consistent delivery, we have a complex arrangement of compute hardware distributed around different machine rooms, some at Hinxton and some off site. Around two years ago we started the process to rebid our machine room space, and last year Gyron, a commercial machine-room provider in the southeast of England, won the next five-year contract. This was good news for efficiency (Gyron provided a similar level of service but at a sharper price) but posed an immediate problem for EMBL-EBI’s systems, networking and service teams: to wit, how we were going to move our infrastructure without disruption?
If you’d like to know a bit more about the people behind the move, you might enjoy an article in Nature News and Comment, Not Your Average Technician, featuring EMBL-EBI data centre engineer Dawn Johnson.