findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
IUCrJ 9 December 2021
A neural-network-based program developed at EMBL Hamburg identified an unknown protein in snake venom
The lancehead snake Bothrops atrox is responsible for the majority of snakebites in the Amazon region of South America. Snakebites, which can cause serious damage to health or even death, happen particularly frequently in poor rural areas and count among neglected tropical diseases. Studying the composition of venoms can help scientists develop new antivenoms to treat snakebites. For this, it is critical to understand the structure of proteins that are active components of snake venom.
Scientists at EMBL Hamburg have now created a tool that will make structural biology studies on snake venoms and other naturally extracted samples significantly easier. Such samples often contain unknown proteins that are hard to identify without additional experiments. Grzegorz Chojnowski from the Wilmanns Group developed a neural-network-based program called findMySequence, which identifies the amino-acid sequences of proteins based only on electron cryo-microscopy and X-ray crystallography data. An unknown protein found in the B. atrox’s venom was the perfect sample for testing the software.
“The program is really useful when you study samples from natural sources, such as snake venom, because you can never be sure what you will find there,” said Chojnowski. “It’s also great for cryo-EM experiments in which you study a molecule in its cellular environment, surrounded by a molecular crowd.”
findMySequence can complement the ground-breaking AI-based AlphaFold2 algorithm, which makes protein structure predictions based on its amino-acid sequence. The program has already contributed, among others, to determining the structure of a molecular complex involved in tuberculosis, and to studying protein communities in cell extracts.
This work is an example of how the complexity of biology in nature can be studied at the molecular level – an important concept explored in the new EMBL Programme ‘Molecules to Ecosystems’, launched in 2022.
The study was performed in collaboration with scientists from EMBL Heidelberg, University of São Paulo in Brazil, Universidad Nacional Mayor de San Marcos in Peru, and University of Liverpool and Rutherford Appleton Laboratory in the UK. Crystals of the venom protein were analysed at the Brazilian National Laboratory of Synchrotron Light, part of the National Center for Energy and Materials.
IUCrJ 9 December 2021
Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.EMBLetc. archive