One of the world’s largest microbiome data resources, MGnify, has announced another major increase in size, through the addition of 500 million proteins. This brings the total to 3 billion non-redundant protein sequences, comprising 729 million clusters. 

“MGnify’s high-quality, carefully-annotated data are perfect for training AI algorithms for the life sciences, not least because they’re open, carefully organised and freely available to all,” said Rob Finn, Microbiome Informatics Team Leader at EMBL-EBI. “We’ve already worked with industrial and academic groups to support the development of new AI tools for the life sciences, and are always open to new collaborations.” 

For example, in 2021 DeepMind’s AlphaFold AI used MGnify data to help predict the structure of all catalogued proteins available in UniProt. These 200 million predictions are now openly available in the AlphaFold Database, jointly developed by DeepMind and EMBL-EBI. 

Meta AI also used MGnify data for the development of its ESMAtlas, containing over 600 million AI-predicted structural models for metagenomic proteins. The ESMAtlas now also contains the additional proteins included in the latest MGnify release. These structure predictions provide new clues into the potential function of these proteins, demonstrating that many protein clusters represent previously unknown subfamilies of existing functional families. 

“MGnify is an incredible resource for the scientific community, cataloguing billions of unknown protein sequences. This allowed us to predict structures for hundreds of millions of proteins with AI, and can help us to see deep into the immense diversity of natural proteins at a scale that has not been possible before.”

– Alex Rives, Research Scientist at Meta

“The new MGnify release contains vastly more data than before,” said Martin Beracochea, MGnify production Project Leader. “It’s a data gold mine for researchers because it contains sequences that were previously unknown to science. Thanks to new AI tools, mining the data to gain new insights is finally becoming a reality.”

Edit