Solving the protein structure puzzle

Proteins are beautiful molecular structures and understanding what they look like has been a goal for scientists for more than half a century. After years of arduous work and frustratingly slow progress, a game-changing artificial intelligence method is poised to disrupt the field.

Haemoglobin protein structure shown over a matrix symbolising artificial intelligence
Credit: Spencer Phillips/EMBL-EBI

By Janet Thornton, Director Emeritus of EMBL-EBI

We call proteins the building blocks of life because they make up all living things, from the smallest virus or bacterium to plants, animals, and humans. But, in reality, proteins don’t look anything like blocks. They are beautifully complex structures and every single one of them is unique. Their shape, also called a structure, is linked to their function, which means their shape determines what they do. For example, haemoglobin transports oxygen around the body, while insulin maintains the delicate balance of sugar within the blood.

Simple question, complex answer

Studying protein structure means you’re faced with a very simple question that requires a very complex answer. A protein is a string of small organic molecules called amino acids, connected in a chain, a bit like beads on a string. This chain of amino acids spontaneously folds up to create a unique and beautiful structure. The simple question is: what does the structure look like?

This problem has been around now for at least 50 years and, after many failed attempts, I came to believe that the only way to make progress was to gather more data to make better predictions. I was proven wrong.

Figuring out the structure of just one protein can take years of experimental work, using expensive equipment and incredibly complex methodology. One method is X-ray crystallography, which blasts crystalline molecules with an X-ray beam. This beam diffracts into many directions and, by measuring the angles and intensities, crystallographers can produce a 3D picture of the density of electrons within the crystal. This reveals the structure of complex biological molecules, including proteins. One of the difficulties of the method is obtaining the crystals, and sadly this method simply hasn’t worked for some proteins.

Experimental meets computational

Luckily, there is an incredibly active and tenacious community of scientists who have dedicated their lives to predicting protein structures or how the chain folds from their amino acid sequences. All newly determined structures are stored in the Protein Data Bank (and its European node, PDBe) and are freely available for anyone in the world to look at.

In the mid 1990s, the need to coordinate efforts and assess progress became clearer than ever, so the community embarked on a worldwide experiment, called Critical Assessment of protein Structure Prediction (CASP). Every two years the organisers launch the challenge of predicting the structure of several proteins. The objective is to test and independently assess new computational methods for structure prediction. These methods use computers, not lab experiments, to predict protein structure. The methods, now increasingly powered by artificial intelligence (AI), had been improving over the past few years, but a solution still seemed a long way off.

That is until this week, when – during the latest CASP conference – the assessors announced that one team, DeepMind’s AlphaFold, had put forward an AI system that achieved unparalleled levels of accuracy. This approach built on our extensive knowledge of protein structures obtained in the lab over the past 60 years. But this was the first time a computational model was deemed to be competitive with experimental methods. And something that would have taken years of experimental work can now be deduced within just days using a new type of neural network.

Why does it matter?

There are millions of proteins that make up the living world, but we only know the structures of a tiny number of them. In fact, we only have experimental  structures (or even partial structures) for 10% of the 20 000 proteins that make up the human body. A powerful AI model could unveil the structures of the other 90%. This is important not just because it improves our understanding of human biology, health, and disease, but also because in the longer term it would offer avenues of research, for example designing new drugs.

Most existing drugs are designed using 3D structures, but they currently target only about a quarter of human proteins. AlphaFold could help unlock more proteins as potential drug targets and open up new approaches to therapies. Furthermore, easily predicting the structure of viruses can help us understand their biology and the diseases they cause. Finally, there may be significant opportunities to understand and treat neglected tropical diseases, where research is currently under-resourced.

The potential goes beyond human health. Understanding plant and animal proteins (as well as their genomes) could help us improve crop yields or breeding procedures. This would hold significant potential for feeding a growing population.

Finally, at a more scientific level, being able to predict structure from sequence is the first real step towards protein design: building proteins that fulfil a specific function. From protein therapeutics to biofuels or enzymes that eat plastic, the possibilities are endless. 

A fine time for protein science

Understanding proteins is a bit like putting together a large 3D jigsaw puzzle in a dark room. You know what some of the pieces look like and you can sometimes match a few together in clusters, but it’s incredibly arduous and a complete solution would rarely be found. A fast and accessible method for determining the whole structure in the computer solves the puzzle automatically.

As a lover of everything protein, the most exciting thing for me is that this breakthrough is not an end, but a whole new beginning, bringing with it electrifying opportunities and follow-on questions. The structures allow us to understand better how the proteins function and, in turn, this could enable us to fine-tune this function for the benefit of people and the planet. Just like the Human Genome Project facilitated the birth of new scientific disciplines, such as genomics, solving the protein structure question could bring about new and exciting fields of research. One thing is for sure, it’s a fine time to be a protein scientist!

This post was originally published on EMBL-EBI News.

Tags: artificial intelligence, bioinformatics, deep learning, embl-ebi, protein, proteins, structural biology, thornton


Looking for past print editions of EMBLetc.? Browse our archive, going back 20 years.

EMBLetc. archive

Newsletter archive

Read past editions of our e-newsletter

For press

Contact the Press Office