The UniProt Consortium, which includes the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI), has added a new database repository for metagenomic and environmental data to its family of protein sequence databases. Metagenomics is the large-scale genomic analysis of microbes recovered from environmental samples as opposed to laboratory-grown organisms which represent only a small proportion of the microbial world.
The UniProt Metagenomic and Environmental Sequences (UniMES) database currently contains the data from the Global Ocean Sampling Expedition (GOS), which was originally submitted to the International Nucleotide Sequence Databases (INSDC). The initial GOS dataset is composed of 28 million DNA sequences from oceanic microbes and predicts nearly 6 million proteins. By combining the predicted protein sequences with automatic classification by InterPro, the EBI’s integrated resource for protein families, domains and functional sites, UniMES uniquely provides free access to the array of genomic information gathered from the sampling expeditions, enhanced by links to further analytical resources.
Genomics holds the key to understanding the world around us and the metagenomic and environmental data represents a step forward in further charting genomic diversity. Rolf Apweiler from EMBL-EBI and one of the leaders of the UniProt Consortium said, “Throughout the ages, biological events have been the basis and source of medical therapies and industrially important processes. Analysing the genomes of diverse and novel species continues this adaptation of biological innovation for beneficial application”.