How does EMBL-EBI run millions of jobs for its users while moving its two large data centres?
EMBL-EBI, home to the world’s most comprehensive collection of life-science data, ran an unprecedented 11.9 million jobs for its users throughout the world in September 2014. At the same time, it began to move hundreds of its servers from two locations in London to a new data centre at Hemel Hempstead. So how does the institute continue to serve up petabytes of data while on the move? Very, very carefully, says head of Technical Services Steven Newhouse.
“The move to Hemel Hempstead has to be very tightly choreographed,” comments Steven. “We’re continuing to run a huge number of jobs and providing access to all our other data and services, so we have to use every last bit of our capacity to do make the move seamless. Fortunately, funding from the UK Government supports all our data centre activities, so we have installed temporary capacity capable of handling twice as many jobs as usual. We’re confident that things will go pretty smoothly.”
We have installed temporary capacity capable of handling twice as many jobs as usual.
Most of the data is duplicated and currently stored in racks of servers at the two London data centres. To ensure uninterrupted service, the contents are also replicated in the Genome Campus at Hinxton outside Cambridge as the physical hardware is moved to Hemel Hempstead. By the end of 2014, the public will access all EMBL-EBI data through the new Hemel Hempstead data centre.
“Our technical teams have been working with everyone at EMBL-EBI to ensure our migration plans are right, down to the last detail,” says Steven.
The millions of jobs being run every month are supported by EMBL-EBI middleware called Job Dispatcher, which handles utilities for the European Nucleotide Archive, Ensembl Genomes, UniProt, Pfam, PDBe and many others. Job Dispatcher brings analysis functionality for most of the institute’s data resources and the EBI Search, and provides the means for users to compare gene or protein sequences to find out how they affect function.
“EMBL-EBI is a regular part of the way lots of different research projects are set up,” explains Rodrigo Lopez, head of Web Development at EMBL-EBI. “Job Dispatcher helps them establish analytical pipelines to query the data automatically and combine it with their own. It also does plenty for users making individual queries, who are exploring the data in a more organic way.”
The growth in demand for biological data has been dramatic: compared with almost 12 million jobs this September, EMBL-EBI ran an average of 5.4 million jobs per month in 2013, when users could search over 60,000 datasets from life-science experiments.