The Pan-Cancer project addresses the challenges of working with datasets across national boundaries
Cloud computing offers unprecedented opportunities for global-scale research collaborations such as the Pan-Cancer Analysis of Whole Genomes project. It also presents a unique set of challenges in terms of data protection and the ethics of data sharing. The rules are far from clear, especially when data are to be shared across the globe.
The Pan-Cancer project has involved more than 1300 scientists and clinicians from 37 countries, and the analysis of more than 2600 genomes to provide new insights into the development of cancer, one of the deadliest diseases to humankind. The capacity for sharing genomic data internationally on such a massive scale, however, also comes with the responsibility to ensure that the data are subject to appropriate security and privacy safeguards. This is an ongoing challenge, which continues to evolve alongside advances in technology and changes in the regulatory landscape, such as the introduction of the General Data Protection Regulation (GDPR) in the EU.
The United Nations of cancer genome projects
It’s a challenge that the project leaders addressed proactively from the outset. Jan Korbel, senior scientist at EMBL and co-director of the Molecular Medicine Partnership Unit, explains: “We consulted ethicists and lawyers to work out how we could operate within the boundaries of existing rules and regulations, in an international context and also in the context of country-specific legislation. I think of the Pan-Cancer project as the United Nations of cancer genome projects.”
There’s increasing public concern over a lack of transparency in how personal data are shared and used, and Korbel acknowledges the importance of public engagement and open dialogue. “It’s our duty as scientists to be open to society, and to inform society about the risks, benefits, and opportunities of this type of research. It’s also important to bring patient representatives into the discussion to inform policy decisions for the good of society.”
Code of conduct
There’s broad agreement in the research community that an international code of conduct on genomic data is the best mechanism for addressing the issues of privacy and data protection, while ensuring that genomic research data can continue to be used internationally, which is considered the most promising means for uncovering the basis of human diseases. Such a code would enshrine best practices and establish clear guidelines for researchers. It’s currently a work in progress – and a complex one at that – as scientists and policymakers navigate the diverse national legislations that need to be accommodated.
The nucleus of this cell fluoresces in bright green thanks to GFP-labelled nucleoporin proteins. EMBL scientists use engineered nucleoporins as 3D reference standards to improve super-resolution microscopy.