EMBL’s Rupert Lück is engaged in developing the European Open Science Cloud: the infrastructure that will support the future of data sharing and analysis in Europe
Facing the global scientific and societal challenges of the 21st century increasingly requires the ability to access, assemble, analyse, and share research data. Often this involves working with extremely large datasets and increasingly with data originating from different scientific disciplines. Researchers need to access an array of digital tools and services, and must have innovative technologies to exchange data, software, and other research outputs.
Research institutions can address these challenges by developing IT infrastructures that help collaborators share data or analysis tools. They often make use of cloud technology that allows users to access data and computational power remotely. EMBL has well-established systems of this kind, such as the Embassy Cloud at EMBL’s European Bioinformatics Institute (EMBL-EBI) and the EMBL 3D Cloud. EMBL is very active in advancing data science and data management strategies across the organisation. EMBL now has a Head of Data Science, and there are a variety of activities related to implementing and piloting software, services, and training to support data science across EMBL. This includes, for example, the data management tools being developed by EMBL’s IT Services or the Genome Biology Unit’s Computational Support, which are continuously being refined in collaboration with EMBL scientists. EMBL is also involved in an even more ambitious, pan-European project: the European Open Science Cloud (EOSC).
An open environment
EOSC is a joint initiative between the European Commission, the EU member states and associated countries, and other stakeholders, which aims to solve the above-mentioned data challenges by building a trusted and open virtual environment for the scientific community and others. Not restricted to one institution or scientific discipline, EOSC is expected to provide access to research data across scientific domains and geographical boundaries, including services addressing the whole research data life cycle. Developing iteratively over the coming years, the aim is that the EOSC will ultimately encompass finding, accessing, combining, analysing, processing, and storing data in line with open science and FAIR principles (see box).
“The EOSC is about increasing the value of data,” says Rupert Lück, EMBL’s Head of IT Services and one of 11 expert members of the EOSC Executive Board. “Europe is the largest producer of publicly funded research data in the world, and the EOSC is about bringing it all together to maximise its impact.”
A lightweight, scalable framework, the EOSC is a practical application of the ideas of open science and open data. It will be governed by a minimal number of rules to uphold these ideas, while allowing the freedom and flexibility to suit a wide user base across national borders and disciplines.
A meeting place for knowledge
EOSC is expected to play a major role in implementing FAIR data and open science – both areas in which EMBL has been driving progress over the past few decades, for example by providing the open access data resources at EMBL’s European Bioinformatics Institute (EMBL-EBI). With the organisation’s experience and long history of providing open access to FAIR data on a large scale, EMBL is well placed to share its expertise and best practices with countries in Europe and beyond, and to take a leading role in developing and shaping the EOSC and its strategic direction.
“I’ve spoken to a number of researchers, for example those involved in handling and analysing the Tara Oceans expedition data, stored at EMBL-EBI,” says Rupert. “They’re very interested in the EOSC and are keen to know when they will be able to use its services and to combine their data with those from other domains. They want to jump on this service as soon as they can. It’s very exciting to understand how useful the EOSC will be for researchers.”
Participation in the EOSC will be voluntary and will require that the data and services made available comply with FAIR principles and European laws on the protection of sensitive personal information.
Making research FAIR
There is a growing movement across disciplines to unlock the full potential of research data. One way to help achieve this is to make data as FAIR as possible. FAIR principles encourage researchers to make data:
Findable: data are assigned a unique and persistent identifier, and are described by rich, searchable metadata.
Accessible: data can be accessed with a standard protocol that is free, open, and can be used by anyone with a computer and an internet connection.
Interoperable: there must be formal, accessible, shared, and broadly applicable standards for representing and structuring data.
Reusable: data and metadata must meet relevant community standards and must be released with a clear usage licence, enabling them to be applied or combined in different settings.
European funding agencies already encourage well-structured data management plans to help assure FAIRness across the whole research life cycle. This will become an even stronger component of future funding schemes such as Horizon Europe, the successor to Horizon 2020.
The EOSC will also help bring recognition to those involved in the production and management of FAIR data at European institutions. FAIR data in the EOSC will be described by associated metadata that directly cites the body that helped to create or maintain it, such as EMBL’s Core Facilities, IT Services, the EMBL-EBI service teams, or the European Research Infrastructure ELIXIR, in which EMBL-EBI is one node. A facility that would normally see itself listed in a paper’s acknowledgment section will be able to more formally demonstrate the quality of its services by showing how often they are cited in publications.
From the ground up
Making all scientific data FAIR and compatible is a Herculean task. Different disciplines have varying levels of experience in handling large volumes of data and it will take time and continued effort to establish best practices. “Implementation of FAIR principles will not happen everywhere automatically, but once they are established, they will become part of research culture,” says Rupert.
There are currently more than 40 projects taking place as part of the initial phase of EOSC implementation, including five so-called cluster projects. These represent thematic clusters of European Strategy Forum on Research Infrastructures (ESFRI) organisations in environmental science, life sciences, high-energy physics and astronomy, photon and neutron science, and social sciences and humanities. “The EOSC-Life project exists within our domain of life sciences to create an open, digital, and collaborative space for biological and medical research,” Rupert explains. “The ESFRI cluster projects bring together the expertise to identify the needs and capabilities of each domain and so shape the landscape of services the EOSC will include.”
EMBL and the EOSC
As a large producer and consumer of life science data and services, EMBL’s involvement in the EOSC includes an active role in EOSC governance, as well as coordination of and participation in major projects around developing infrastructure or large-scale scientific demonstrators for EOSC. EMBL also plays a key role in two of the biological and medical research infrastructures involved in EOSC-Life, ELIXIR and Euro-BioImaging. These different levels of active engagement have also significantly increased EMBL’s visibility beyond the life science domain. A recent and widely visible example of synergies between EMBL and the EOSC is the COVID-19 Data Platform. The platform is a priority pilot project, aimed at realising the objectives of the EOSC and building upon established networks between EMBL-EBI and national public health data infrastructures.
The EOSC initiative also ties in with one of EMBL’s core missions: training. Helping researchers handle data according to FAIR principles and to utilise data management systems throughout their work will seed good practice across Europe and beyond. “We hope that we can position EMBL as a role model in scientific data management and let others benefit from our experience of running the big data infrastructure at EMBL, where the open data and services provided through EMBL-EBI, in particular, could be seen as a small model for EOSC,” says Rupert.
Built to last
Rupert was selected as EMBL’s representative on the EOSC Executive Board with the support of EIROforum: an association of eight European intergovernmental research organisations, including EMBL. “For many years, I’ve been deeply involved in the activities of EIROforum,” he says. “Our IT working group there is very actively involved in developments and policymaking related to European IT infrastructures, and in joint IT projects of the EIROforum labs.”
As a key activity of the EOSC Executive Board, Rupert has co-chaired the EOSC Sustainability Working Group for the past one-and-a-half years. This group is tasked with researching the possible legal, financial, and governance structures the EOSC could implement, and with examining how existing and future data initiatives across Europe could be federated. The working group’s findings and recommendations have actively shaped the transition to EOSC’s second phase of implementation after 2020. The group’s work on future financial and legal models has contributed significantly to the recent creation of the EOSC Association, which is growing rapidly and bringing together the various stakeholders of the EOSC under a new governance structure from 2021 onwards. In addition, the EOSC Association is expected to enter into a European Strategic Partnership with the European Commission at the end of this year, to enable substantial funding for EOSC activities during the seven-year Horizon Europe framework programme.
Balancing the needs of the many different actors and stakeholders involved, including European and national research funders, EU Member States, research infrastructures, universities or public and commercial IT service providers, as well as dealing with the variety of existing national and European data initiatives, is a complex task. However, Rupert believes it’s worth it. “I’m incredibly excited to be a part of the EOSC and to be able to shape it together with other colleagues from EMBL, EIROforum, and the many, many other organisations involved. Nothing like this exists anywhere else in the world, with this level of richness and ambition to bring together the different sciences, linking datasets, federating infrastructures, aligning policies, and involving a multitude of stakeholders throughout the data life cycle and across the European research ecosystem. Although this major challenge is associated with risks, due to the underlying complexity, the EOSC is expected to open up new scientific perspectives for EMBL and, moreover, to offer a real opportunity to further raise our profile and leverage our strengths here in Europe.”