RNAcentral, the first unified resource for all types of non-coding RNA data, has been launched by the RNAcentral Consortium. It aggregates information from a federation of expert databases, and provides tools for easy browsing. The initial release of RNAcentral contains approximately 8 million sequences.
Since the 1950s, scientists have thought of RNA as an intermediate molecule that provides a link between stable DNA and proteins. However, over recent decades it has become clear that RNA plays a much wider range of roles in living organisms. Researchers have discovered a lot about different types of RNA, but until now these data have not been put in one place.
Before RNAcentral, finding the RNAs encoded by a specific genome required fetching information from several independent resources, for example miRBase for microRNAs and HAVANA for lncRNAs.
“There is plenty of published data on non-coding RNAs, but each subtype is maintained separately,” explains Alex Bateman, head of Protein Sequence Resources at EMBL-EBI. “This is the first time we have a central place where you can find it all: piRNAs, ribosomal RNAs, everything. A lot of that information has typically been locked up in supplementary materials, or referred to only by a non-standard gene name. RNAcentral is a big step towards making RNA sequence as easy to access for research as protein sequence.”
This is the first time we have a central place where you can find it all: piRNAs, ribosomal RNAs, everything.
RNAcentral 1.0 offers access to data from ten different expert databases and provides stable accession numbers that can be used consistently in the literature, other molecular databases and search engines.The RNAcentral website features a faceted search, which lets users explore different RNA sequences according to source, species and molecular function. Further expert databases are expected to be included in future releases.
The RNAcentral consortium has its roots in a workshop held on the Wellcome Genome Campus in 2010, where members of the RNA community came together to discuss the lack of centralised access to RNA data.
“It is really satisfying to see this project come to fruition,” explains Sam Griffiths-Jones of The University of Manchester. “The growth in non-coding RNA sequence and functional information is phenomenal and shows no signs of slowing, and there has never been a greater demand for a universal resource for these data. The collaboration of RNAcentral consortium members to produce this resource represents an enormous step forward for the RNA field.”
Thanks to funding from the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), partner institutes throughout the world were able to come together and build a practical solution to a shared problem.
BBSRC Chief Executive Professor Jackie Hunter said: “Fundamental research into non-coding RNAs has many potential applications, including disease diagnostics, new therapies and biotechnology. With the abundance of data now available due to next generation DNA sequencing, there is an urgent need for informatics tools to decipher it. RNAcentral is vital resource that will aggregate and integrate information to unify the data landscape and improve the discoverability and use of data by researchers worldwide.”
The resource uses EMBL-EBI infrastructure, notably data-submission and cross-reference services provided by the European Nucleotide Archive (ENA). It takes advantage of the nightly, global synchronisation of data from the International Nucleotide Sequence Database Collaboration (INSDC).
Future versions of RNAcentral will include additional data types and information about RNA structure, modifications, molecular interactions and function. A paper describing RNAcentral tools and features in detail has been accepted for publication in the journal Nucleic Acids Research.
RNAcentral expert databases
The RNAcentral consortium currently includes 24 RNA database resources. Ten of these are present in the first release: European Nucleotide Archive; Rfam; RefSeq; VEGA; gtRNAdb, RDP; miRBase, tmRNA Website, sRNAmap, SRPDB and lncRNAdb, with the many others planned for coming releases. See the up-to-date list.
European Bioinformatics Institute (EMBL-EBI), UK; University of Manchester, UK; Wellcome Trust Sanger Institute, UK; University of California Santa Cruz, US; University of Texas, US; Auburn University, US; Sandia National Laboratory, US; University of Oxford, UK; Garvan Institute of Medical Research, Australia; International Institute of Molecular and Cell Biology Warsaw and Adam Mickiewicz University, Poland; Rockefeller University, US; Chinese Academy of Sciences, China; Peking Union Medical College and Taicang Institute of Life Sciences Information, China; Michigan State University, US; National Chiao Tung University, China; Stanford University, US; University of Thessaly, Greece; Institute of Bioinformatics and Systems Biology, Department of Biological Sciene and Technology, National Chiao Tung University, HsinChu, Taiwan; National Center for Biotechnology Information (NCBI), US.