The EMBL IT infrastructures are at the heart of the laboratory’s data-driven biology, facing massive challenges of vast and exponentially growing quantities of research data generated by high-throughput experiments. EMBL operates IT centres at all sites dedicated to the site-specific requirements. The two biggest IT centres are at EMBL Heidelberg, Germany and at EMBL-EBI in Hinxton, UK.
At EMBL Heidelberg, the IT infrastructure is designed to support data-driven science involving key technologies such as genome sequencing, large-scale imaging, computational biology and modelling. This ensures the enormous amounts of data generated in Heidelberg can be securely stored, analysed and shared within the laboratory or across large international research consortia.
The EMBL IT Services in Heidelberg operate a highly resilient triple-data centre infrastructure which builds on state-of-the-art technology and, according to demand, is flexibly scalable in terms of capacity and performance. With these strategies in place the centre is designed to support an exponentially growing data footprint in terms of data storage as well as high-performance compute (HPC) and cloud-based capacities to support downstream analysis. IT Services operate a multi-tiered data storage platform with a current capacity of 100 PB – presently doubling each 18 months – building on high-end disk, flash and tape technology and efficiently supporting the different needs of EMBL’s scientists in terms of I/O performance, robustness, durability and cost.
Our team of life science IT professionals is highly committed to work with the scientists and experts at EMBL and beyond to provide the future IT solutions for EMBL science and taking data-driven biology at EMBL to the next level. By implementing innovative technology and best practises, we are responding to the ever-growing demands of IT delivery for EMBL’s science.
High Performance Computing
Scientific data analysis at EMBL to a large extent relies on access to high-performance computing. An increasing number of users, presently about 40% of all EMBL scientists, implement HPC as part of their data analysis workflows. In 2020 these users spent about 3300 CPU years spread across more than 24 million compute jobs using one of the key number crunching resources, the EMBL Heidelberg’s HPC cluster. This facility primarily supports the needs of local researchers but also offers capacity to the wider scientific EMBL community. Based on latest Intel and AMD technology this facility presently provides access to more than 10,000 CPU cores and a total RAM/memory of 70 TB. The cluster further integrates compute power using GPU hardware, based on recent Nvidia Ampere hardware and adding about 680 TFLOPS of floating point performance. The underlying parallel file system is designed to offer about 200GB/s of bandwidth to support the massive I/O needs found with data intensive bioinformatics and computational biology HPC workloads.
IT virtualisation and cloud technology are vital to on-demand IT services delivery at EMBL. With completely virtualised data centres at EMBL Heidelberg, Rome and Barcelona, the IT Services are able to provide large-scale robust, scalable, flexible and cost-effective IT services to EMBL users. The IT team operates a growing number of both virtualised servers (VMs) and containers (based on Kubernetes) on their platforms providing core IT services as well as specialised servers and services hosted for scientific and non-scientific groups across EMBL. In addition, there are dedicated cloud areas to support the needs of EMBL Scientific Groups and Core Facilities. For example, the EMBL 3D Cloud has been designed for researchers involved in imaging and Big Data analysis and who routinely exploit machine learning. Using cloud-based GPU power, it offers remote visualisation and image data analysis capacities at the scale of highest-end graphics workstations and at the same time fully exploiting the performance of the data centre infrastructures in terms of I/O performance, etc.
Further to the provisioning of container-based cloud services built on Kubernetes and leveraging S3-compatible object storage, OpenStack is used to run workloads for both associated de.NBI projects and EMBL projects. The IT Services also support the integration of commercial and public cloud services to broaden their portfolio of innovative IT services.