In the opinion piece below, EMBL-EBI's Guy Cochrane explores how open scientific databases can contribute to Access and Benefit Sharing systems under agreements such as the Nagoya Protocol.

The remarkable diversity of the natural world can serve as an endless source of inspiration for science and industry alike. Accessing genetic resources from across the world helps us develop products, knowledge and services that benefit humankind, from medicines to cosmetics, agricultural practices and beyond.

By Guy Cochrane, Team Leader, European Nucleotide Archive

Open data sits at the very heart of these innovations, but because genetic resources are not evenly distributed around the world, systems for fair use of these resources are required. As these biodiversity-related Access and Benefit Sharing (ABS) systems are established under agreements such as the Nagoya Protocol, two key contributions from the open scientific databases must take their place as a part of any eventual implementation.

ABS systems stem from a need for fair and equitable access to products, technologies and knowledge derived from discoveries within the living world. Targeting situations when an element of biodiversity (such as a specimen, a natural product or an enzyme) is discovered within one nation, but developed in another, ABS systems are intended to ensure that the benefits, commercial and otherwise, are shared equitably with the originating nation. Examples of such benefits include a drug developed from a screen of natural products derived from indigenous plants, or a biotechnological process developed based on observations of novel biochemistries discovered in soil fungi. Sharing benefit with the originator nation might include reduced cost of access to the new drug and opportunities for academic collaboration.

The implementation of ABS requires many components, systems and actors, including: stakeholders from diverse backgrounds, such as scientists, lawyers and government officers; systems to reconstruct transparent value chains; compliance assessment procedures; and much more.

Open data and the bioinformatics infrastructure effect

Scientific databases, such as the European Nucleotide Archive (ENA) for DNA sequence data, SILVA, a database of reference taxonomic genetic markers, and PRIDE, the repository for proteomics identifications, have open science, and hence open data, at their hearts. ENA, for example, has driven open sequence data publication for almost four decades as part of an international partnership across three continents that has open data as its ethos.

The work of such databases is to collate, integrate, curate and make freely available to the public the world’s scientific data. Adding no constraints on the use and reuse of the data that they serve up, these databases provide international access through a multitude of web, programmatic and FTP interfaces and offer training and user support programmes to facilitate their use.

Because the databases are open, their data are available to other open databases. Indeed an intricate and sophisticated ecosystem of services, tools and data resources exist into which open data are immediately - and without human intervention - propagated.

Example of the propagation of open data through the bioinformatics data infrastructure.
An annotated sequence from a newly isolated species autonomously triggers a flow of protein-coding genes into UniProtKB, which in turn will propagate data to build sequence family models in PFAM for use in InterPro, that provides open tools for the functional exploration of further sequences.

 

This means that data from an open scientific database create benefit through this rapid propagation to yield a wealth of added-value data supporting discovery, analysis and interpretation.

Contribution number one:

Through open data, scientific databases enable free access to data, creating and making available benefit through integration into the network of bioinformatics databases and tools.

Scientific databases are science-led

The open scientific databases exist to support the work of the scientific community. Driven by scientific needs and led by scientists, their expertise lies in the world of scientific data. When asked, as has happened in the context of ABS, to track material transfer agreements, sampling licenses and other legal documents alongside their data, open scientific databases are quick to resist. This is for good reason; not only does the expertise of their staff lie in scientific - rather than legal - data, but their infrastructure is at its heart designed for the prosecution of science, not law.

The focal point of a scientific database is a scientific dataset, with formally defined technical structures and surrounding software tools, data life-cycles based on data generation, analysis processes and publication in the scientific literature, as well as trust-based agreements around data quality and integrity.

On the other hand, the focal point of legal tracking systems are documents, expressed in language requiring expert legal interpretation, with life-cycles relating to dates of signature, coming into force and termination, governed by formal agreements within defined jurisdictions. Tracking legal documents relating to biodiversity datasets is required in an ABS system, but it doesn’t fit well with scientific databases.

However, there is one common interest: the need for both to track provenance. This refers to recording and declaring the source of a biological sample, a dataset derived from a sample, a publication, a patent, a commercial product and so on. Scientific databases concern themselves deeply with provenance: reproducing experiments, verifying findings, reusing data, aggregating data and meta-analysing - key elements of good scientific practice - all require transparent data provenance. The legal world must track provenance to map out value chains, to define the original source from which knowledge was derived, to link the development and application of products and technologies from this knowledge and, ultimately, to enable the sharing of benefits between stakeholders.

Contribution number two:

Open scientific databases provide transparency in data provenance that can serve as a key input to the discovery and description of ABS value chains. 

A role for open scientific databases in ABS

So what form might a successful ABS implementation take? The contributions from open scientific databases - freely accessible data and transparent data provenance trails - are essential, but are only two of many further components that must be in place.

Beyond these must lie tracking systems for legal agreements, systems to educate users, compliance checking and, perhaps, sanctioning systems. These components will come not from the scientific database community, but from a host of other professionals implicated in ABS.

As with any complex system, cooperation between components, and the people that look after them, is key. How best to connect the components? The first steps are all about mutual understanding - what the components offer and how their interfaces work. On the open scientific database side, what content exactly is on offer and how can data provenance be accessed? Now is the time to talk.

Working together to connect these components into viable holistic systems will help assure that ABS can genuinely and sustainably be delivered. Although it may take some effort to achieve this, it is an essential step for understanding and responsibly using genetic resources in the future.

Edit