Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally
Genome medicine, 2017
Genome medicine, 2017
IT Services teams across EMBL have been focussing on reducing their energy consumption with impressive results, supporting our budgets and sustainability.
Over the years, EMBL has seen an enormous increase in the computational data it manages. Advances in detector technologies, sequencing, and omics fields, as well as the opening of the new Imaging Centre, will all likely lead to continued exponential growth in data volume, approximately doubling every 18 to 24 months.
As data volume continues to grow, so does the need for increased analysis capacity. Sharing large data quantities across EMBL sites or within international scientific consortia (along the lines of the Data Science transversal theme in the ‘Molecules to Ecosystems’ programme) will continue to pose enormous demands on the scalability of the laboratory’s IT infrastructure. This includes compute power, AI, cloud services, different data storage types, and networking. Ultimately, this will increase our energy requirements.
Until now, all EMBL sites have kept pace with the enormous demands of data-driven science and scaling individual IT infrastructures accordingly. However, with the ongoing energy crisis, the laboratory now faces a dilemma – having to balance an increasing requirement for energy-intensive data management and the need to significantly reduce energy costs for economic and environmental purposes.
It is safe to say that large-scale data management in the life sciences, facilitated through technological advances such as AlphaFold or new imagingtechnology, will be at the heart of much of the exciting research and new service opportunities at EMBL in the years to come. But EMBL is in a unique position to exploit these opportunities at the same time as meeting the enormous associated IT challenges.
Contributors: Rupert Lück, Jessica Klemeier, Jure Pečar, Remi Pinck, Eduard Avetisyan and Tim Dyce
Even before accounting for the expected growth in data services, EMBL Data Centres are already the single biggest consumer of electricity across EMBL. In 2022, the data centres in Heidelberg, Hinxton (on-campus and external), Grenoble, and Hamburg together consumed 10.6 million kWh – the same amount of electricity as that consumed by 2,100 three-person households in Germany.
With these numbers in mind, it is vital that EMBL operates its scientific data management in the most efficient way possible to support the organisation’s immediate cost-saving measures and our longer-term ambitions to be a more sustainable organisation. The good news is that the IT Services Teams across our sites have been working on this for some time and in 2022, we have seen some really impressive results, including a reduction of electricity use by 11% compared to 2021 as shown in this chart.
In Heidelberg, our data centre supports research on campus as well as at our other sites. The high-performance compute (HPC) services support the many different workloads of 400 researchers from 90 groups across EMBL. This can include, for example, piecing together the data from imaging services to turn hundreds of thousands of 2D image slices into a beautiful 3D model of a protein complex. It is obvious that complex scientific datasets require a lot of data storage, which, in Heidelberg, totals to a current capacity of over 100 petabytes of scientific data – a continuously growing figure.
Our researchers don’t typically require these nodes to be available within milliseconds and can afford to wait a minute or two for them. So, Jure implemented a simple but effective strategy – automatically powering off these nodes after an idle period and powering them back up when demand re-appears.
In October, the team tackled the next challenge. Inspired by the latest generation of CPUs which have an ‘ECO-mode’, the team mimicked this on our current machines that run on previous generation CPUs. Part of this change was setting up a limit to the maximum frequency these CPUs can operate at. This reduces the electricity consumed by the CPU’s but also means the units operate at lower temperature, lowering the demand on the data centre cooling systems. While previously, we had to cool down the data centre cold air supply to 21°C, now the team can operate the room at 24°C. Heat is taken out of the data centre via chilled water, previously held at a temperature of 12°C. Thanks to the mimicked CPU ECO-mode, the water temperature can now be increased to 16°C even during summertime, further reducing the load on the data centre cooling system and decreasing the amount of electrical power it consumes.
Overall, these strategic changes have resulted in immediate significant energy savings related to the data centre, as highlighted in the charts below showing that significantly less electricity is consumed per billion seconds of CPU work. Jure and the IT Services team expect that a similar strategy could be implemented for the 200 GPUs in the Heidelberg data centre (which use specialised graphics processor cards critical in supporting EMBL’s massive AI workloads) after testing the possible impact.
To reduce energy costs, Jure Pečar from the IT team in Heidelberg has implemented two effective initiatives. First, Jure observed that the dynamic nature of compute workloads associated with EMBL research means that not all of the available resources are always in use. However, even while idle, compute nodes still consume power to stay awake and available.
EMBL Hamburg’s IT team, led by Eduard Avetisyan, has implemented similar measures to Heidelberg, including a workflow to automatically shut down the servers not occupied by calculations in the High Performance Computing (HPC) cluster in Hamburg and automatically turn these back on once there were user jobs submitted. These optimisations result in energy savings of up to 100 kWh per day. Further optimisation is planned with the replacement of old batch nodes with more efficient high-performance nodes. The shutdown of the idling nodes is being implemented on a second HPC cluster in Hamburg, dedicated to the processing of the beamline data. Along with the shutdown of the idling nodes, the cooling of the server rooms has also been adjusted to allow higher air temperatures.
The EMBL-EBI data services are by far the biggest data service at EMBL, responsible for consuming over 8,000,000 kWh of electricity per year and over 1,500 tonnes of CO2 alone. In Hinxton, the team led by Tim Dyce has been reducing the environmental impact of EMBL-EBI’s data services for the last three years with impressive results. The team utilises the services of three third party data centres – the original one on the Wellcome Genome Campus, the Virtus centre in Slough, and most recently, Kao in Harlow. The Kao data centre is extremely energy efficient, with a PUE factor of 1.25 which compares favourably to a global average of 1.57. (PUE = Power Use Effectiveness, meaning the ratio of power used by the IT equipment against the non IT equipment (cooling/lighting/security etc). The lower the UE, the more efficient the data centre.)
Along with moving data to more efficient centres, IT Services teams have also worked directly with users and research groups to migrate and consolidate their data and computational activity onto newer, more power-efficient platforms. Through this migration, many petabytes of data have been identified as no longer required, and the deletion of this data has resulted in immediate and ongoing reductions in electricity use.
Moving our services from the less efficient centres to Kao and consolidating data is a huge ongoing task for Tim’s team, but the results in terms of energy savings are already clear, as shown on the chart below. The constant reduction in energy use is obvious, and has resulted in a 10% reduction in electricity consumption in 2022.
In Grenoble, reducing the environmental impact of IT Services was on the agenda back in 2017 when all of the HPC compute systems were transferred into a container and placed outside the centre, in the open. According to Remi Pinck, who leads the Grenoble IT service, this had two major environmental benefits. First, when the weather is suitable (as it often is in Grenoble), large fans blow fresh air from outside through the container, keeping the servers cool enough without the need for any additional cooling systems.
This ‘free-cooling’ is the lowest impact cooling system available. When outside temperatures are not suitable, like in the summer, the container uses an adiabatic cooling system, which uses the thermodynamic characteristics of evaporating water to cool down the hot summer air. This system is less energy intensive than traditional systems and importantly, doesn’t rely on refrigerants. Many refrigerants, if leaked, have a much higher global warming potential than carbon dioxide.
Using the computational and storage resources of EMBL as efficiently as possible aids Tim, Jure, Remi, Eduard and all our colleagues at EMBL who are working on reducing our IT services’ energy demand. We must all be cognisant of the scale of the demand we put on our data centres and the impact this has in terms of energy, carbon, and operating costs.
If you are looking for helpful tips to make your own computing more sustainable, EMBL-EBI’s very own Dr. Alex Bateman co-authored this paper that lists 10 simple steps towards the same.
No matching publications found