Protecting data in the cloud

The Pan-Cancer project addresses the challenges of working with datasets across national boundaries

An artistic representation on cloud computing, while the data is stored in different data centres around the world. Credit: Rayne Zaayman-Gallant/EMBL

Cloud computing offers unprecedented opportunities for global-scale research collaborations such as the Pan-Cancer Analysis of Whole Genomes project. It also presents a unique set of challenges in terms of data protection and the ethics of data sharing. The rules are far from clear, especially when data are to be shared across the globe.

The Pan-Cancer project has involved more than 1300 scientists and clinicians from 37 countries, and the analysis of more than 2600 genomes to provide new insights into the development of cancer, one of the deadliest diseases to humankind. The capacity for sharing genomic data internationally on such a massive scale, however, also comes with the responsibility to ensure that the data are subject to appropriate security and privacy safeguards. This is an ongoing challenge, which continues to evolve alongside advances in technology and changes in the regulatory landscape, such as the introduction of the General Data Protection Regulation (GDPR) in the EU.

The United Nations of cancer genome projects

It’s a challenge that the project leaders addressed proactively from the outset. Jan Korbel, senior scientist at EMBL and co-director of the Molecular Medicine Partnership Unit, explains: “We consulted ethicists and lawyers to work out how we could operate within the boundaries of existing rules and regulations, in an international context and also in the context of country-specific legislation. I think of the Pan-Cancer project as the United Nations of cancer genome projects.”

There’s increasing public concern over a lack of transparency in how personal data are shared and used, and Korbel acknowledges the importance of public engagement and open dialogue. “It’s our duty as scientists to be open to society, and to inform society about the risks, benefits, and opportunities of this type of research. It’s also important to bring patient representatives into the discussion to inform policy decisions for the good of society.”

Code of conduct

There’s broad agreement in the research community that an international code of conduct on genomic data is the best mechanism for addressing the issues of privacy and data protection, while ensuring that genomic research data can continue to be used internationally, which is considered the most promising means for uncovering the basis of human diseases. Such a code would enshrine best practices and establish clear guidelines for researchers. It’s currently a work in progress – and a complex one at that – as scientists and policymakers navigate the diverse national legislations that need to be accommodated.

Tags: cancer, cloud computing, data privacy, data protection, ethics, genome, pan cancer


Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.

EMBLetc. archive

Newsletter archive

Read past editions of our e-newsletter

For press

Contact the Press Office