A new ambitious project by the Wellcome Trust, Chan Zuckerberg Initiative, and DataCite, aims to build the Open Global Data Citation Corpus – a central aggregate for research data citations from diverse sources to help the scholarly community monitor impact, inform future funding, and improve the dissemination of research. The community will be able to access the aggregated data citations via a user dashboard and an openly-available API. 

Europe PMC – EMBL-EBI’s open access life science database – will contribute life science accession numbers from EMBL-EBI and ELIXIR core resources to populate the corpus. This will be supplemented by other citations such as DOIs from articles, preprints, government documents, and other outputs through third-party sources that aggregate or discover citations and data sources which collect citations as part of the deposit workflow. 

Contributing to the Open Global Data Citation Corpus

Europe PMC runs a text and data mining pipeline that identifies accession numbers for over 45 databases from abstracts and full text of life science preprints and journal articles. The data is then linked to the corresponding data record in the relevant database, such as the European Nucleotide Archive (ENA), Uniprot, the PRoteomics IDEntifications Database (PRIDE), Pfam, Ensembl, and many more. For the Open Global Data Citation Corpus the accession numbers from the Europe PMC full text corpus will be aggregated into a seed datafile and linked to the relevant databases. This improves data discovery in research.

The number of accession numbers obtained from different EMBL-EBI databases (green) and ELIXIR databases (orange). Credit: Aravind Venkatesan, Senior Data Scientist at EMBL-EBI

Europe PMC’s contribution to the corpus will enable further understanding of data usage patterns, increase the reusability and transparency of data whilst highlighting the need to properly cite the use and reuse of data in publications and preprints. 

Europe PMC joins community discussions on building the Open Global Data Citation Corpus

Europe PMC Team Leader Melissa Harrison recently participated in a discussion dedicated to the Open Global Data Citation Corpus. Harrison’s key takeaways were: 

Watch the full webinar discussion, including talks from Cristine Ferguson (Wellcome Trust), Matthew Buys (DataCite), Ana-Maria Istrate (Chan Zuckerberg Initiative), and a variety of community representatives.

Edit