ArrayExpress database doubles in size to 100,000 hybridisations
ArrayExpress, the publicly available database of transcriptomics data at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), has doubled in size in 2007, reaching the 100,000-hybridisation milestone. The database now holds snapshots of gene expression (identifying which genes are specifically expressed in a particular tissue or in response to a drug, for example) for more than 180 species under thousands of experimental conditions.
The latest acceleration in growth reflects not only the increased numbers of direct submissions, but also the mass import of data from the Gene Expression Omnibus (GEO), which is produced by the US National Institute of Biotechnology Information. The import of data from GEO is the first step towards the regular exchange of data among public repositories for transcriptomic data. Similar data exchange agreements among biological data providers are widely recognised as being the most effective way of maintaining and quality-controlling the public record of biological information.
GEO data entering ArrayExpress are curated and scored for compliance to MIAME, the microarray community’s minimum information standard. Users can therefore search for experiments that have been submitted to either database, annotated with common terms, and can download them in MAGETAB – a user-friendly tab-delimited format that simplifies meta-analyses of experiments from different labs. MAGETAB was developed under the direction of the MGED society, which works to simplify data sharing for microarray researchers.
The imported GEO data, like all the data in ArrayExpress, have also been integrated with other EBI resources. For example, users can now hop from an ArrayExpress entry straight to the relevant genes in Ensembl or proteins in UniProt, simplifying the process of data analysis and download for biomedical researchers.
The rate of growth of ArrayExpress seems set to increase further in the future, as new high-throughput sequencing-based transcriptomics applications are already resulting in the generation of huge amounts of data. Dealing with this barrage of new data will be the next challenge for ArrayExpress and its collaborators.