The SARS-CoV-2 Data Hubs are a set of tools coupled with infrastructure that support four components: the submission, analysis, presentation and visualisation of SARS-CoV-2 raw read data, and its resulting analyses. What makes Data Hubs attractive is a unique set of features:

  1. Collaborative: Data Hubs enable data sharing amongst a group of collaborators in pre-publication, private mode, prior to publishing and data release.
  2. Configurable: Data Hubs enable researchers to use preferred tools for data submission, search and retrieval.
  3. Sustainable: Data Hubs are built on top of European Nucleotide Archive (ENA) infrastructure in order to reuse the same storage, data and metadata models.
The SARS-CoV-2 Data Hubs consist of four main components – submission services, analysis infrastructure and workflows, presentation services and tools, and visualisation services.

A new publication in Microbial Genomics has been released describing the functionality of SARS-CoV-2 Data Hubs and specifically, a public SARS-CoV-2 Data Hub in detail. 

The publication covers:

  1. Open data sharing through the European COVID-19 Data Platform
  2. Tools for submission, analysis, visualisation and data claiming (e.g. ORCiD)
  3. The systematic analysis of these datasets, at scale via the SARS-CoV-2 Data Hubs
  4. Lessons learnt

Using the COVID-19 Sequence Analysis Pipeline, the teams from Eotvos Lorand University (ELTE) in Hungary, Erasmus Medical Centre (EMC) in the Netherlands, Technical University of Denmark (DTU) and EMBL-EBI systematically analysed public raw read datasets generating consensus sequences and variant calls. These analysis products were then shared back to the public Data Hub where they are:

What’s next?

Through the support of the VEO and BY-COVID projects, the SARS-CoV-2 Data Hubs are being expanded to support pandemic preparedness, using the data hubs foundations originally built under the COMPARE project. With this objective, the project team broadened their scope to cover more pathogens, which also implies more diseases, creating the new Pathogen Data Hubs, a component of the recently launched Pathogens Platform.

Based on the same concept containing four ‘mix and match’ components it is designed to support data sharing of pathogens beyond SARS-CoV-2. Visit the Pathogen Data Hubs dedicated page on the Pathogen Portal.

Acknowledgements

This work would not have been possible without:

A full list of funders and partners can be found here.

Edit