Most of what happens in cells is the work of machines that contain dozens of molecules, chiefly proteins. With the completion of human and other genomes, researchers now have a nearly complete ‘parts list’ of such machines; what’s lacking is the manual telling where all the pieces go. A new study by scientists at the European Molecular Biology Laboratory [EMBL] promises to answer this question for some of the smallest and trickiest components in the cellular toolbox. Their work appears in the current issue of the Public Library of Science’s on-line journal, PLoS Biology.
A protein consists of a sticky string of amino acids which usually folds up because of attractions between some of its atoms. This creates a bundle called a globular domain whose shape and chemistry determine what other molecules can bind to it.
“If we could look at the chemical ‘spelling’ of a protein and guess what machines it fits into, we’d know a lot more about what happens in cells,” says Rob Russell, head of the Heidelberg lab that carried out the current study. “We’ve made a lot of progress in predicting how globular domains interact with each other. But sometimes a surface on one globular domain will grab a tiny, string-like region of another protein called a linear motif. Finding such motifs and predicting where they fit in is like looking for needles in haystacks.”
Or like looking at a line of automobiles and trying to decide which one a bulky motor fits into – versus trying to find where a tiny screw goes. Linear motifs are so small that it is hard to tell what features allow them to bind to other molecules. Now Victor Neduva, a PhD student in Russell’s group, has developed a method to scan molecules and tease out new linear motifs.
“If two or more different proteins share a binding partner, there is often a common motif,” Neduva says. “The hard part is finding a 3-to-8 ‘letter’ pattern in a protein sequence that may be thousands of amino acids long.”
The method Neduva and his colleagues invented draws on large-scale studies of protein binding in the cells of yeast, flies, worms and humans. Those studies have produced parts lists of molecular machines. And the data holds a wealth of information about linear motifs – if it can be mined.
The scientists distilled all of this information in a series of steps – discarding parts of the proteins likely to dock via large surfaces, and zooming in on small regions of the remaining molecules that might hold motifs. Then it was up to the computer to scan the sequences for small patterns. The attempt was successful: in the fly data, for example, 26 sets of proteins seemed to be interacting through linear motifs.
“One challenge was to eliminate red herrings, which crop up everywhere when you look for very small patterns,” Russell says. “The fact that nine of these motifs were already known was a sign we were on the right trail; we then did follow-up experiments in collaboration with Luis Serrano’s group at EMBL to test some of the others.”
One prediction, for example, suggested that a linear motif would bind to the fly protein translin. The scientists verified that this happened, then they made subtle changes in the sequence. When these changes stopped the molecules binding, they knew they had a new linear motif.
Now the lab will expand the method; Russell predicts that hundreds of linear motifs remain to be discovered. This has important implications for the study of genetic diseases. “A lot of work has gone into discovering mutations that affect protein binding,” he says. “Because linear motifs are so small, every bit of information is crucial, and any change is likely to be disruptive. But so far, because of their size, these motifs have been below the radar of most methods to tie protein structures to disease.”