‘A Million Peptide Motifs for the Molecular Biologist,’ a paper from Toby Gibson and like-minded colleagues in Molecular Cell boldly claims.
Those not steeped in structural and computational biology will likely miss the joke; the new review completes a trilogy of papers published over the last two decades, each one upping the numerical ante on its predecessor (‘One thousand families for the molecular biologist’ by Chothia in 1992 and ‘Ten thousand interactions for the molecular biologist’ by EMBL alumni Aloy and Russell in 2004). The upward trend in the literature reveals that as technology allows scientists to investigate smaller and smaller cellular structures with increasing accuracy, they are discovering a level of molecular complexity that is far beyond what earlier generations had predicted. Molecular biologists often focus years of their research on individual molecules or pathways. “You’ve got to be reductionist. You’ve got to tease apart the systems and then rebuild them to understand them”, says Gibson, a team leader at EMBL Heidelberg. “What happens is that you then underestimate the complexity in the system.” It’s that complexity that he hopes to bring to the fore and reestablish as a primary concern for the field going forward.
In their review, Gibson and colleagues focus on specific elements called peptide motifs: which are short sites of molecular interaction within larger ‘intrinsically disordered’ regions of proteins. Being disordered doesn’t mean those regions misbehave; they simply don’t fold into a specified shape on their own. They need to be bound by another molecule to become more rigidly shaped, often by a so-called “induced fit mechanism.” (The natively ordered regions, called ‘globular domains,’ are the most well-known protein modules because their always-folded shape makes them amenable to techniques such as X-ray crystallography, and their structure often provides insight into the protein’s function.) Peptide motifs far outnumber ordered domains in the human proteome, and the number of potential interactions those motifs engage in is expected to be correspondingly high. Not only can peptide motifs bind to the ordered domains of other proteins, many of them can also be modified by a variety of enzymes at post-translational modification (PTM) sites, contained within the motif. Phosphorylation, the most common type of protein modification, is predicted to occur at a maximum of about 400,000 PTMs. But phosphorylation is just one type of site-specific protein modification; there are over 300 different kinds of PTMs, a number which contributes toward Gibson’s estimation of one million peptide motifs.
While acknowledging the possibility of large numbers of PTMs, some scientists argue that not all of them are functional. PTMs seem to vary quite a bit between species and accumulate changes quickly (on evolutionary timescales), traits that might imply that they aren’t vital for survival. By contrast, molecules called piRNAs that protect DNA in reproductive cells from certain mutations take the same form in just about every animal. In addition (though there are exceptions, e.g. in cell cycle), a single PTM might have so small an effect that mutating it wouldn’t change the observable cell function. But, says Gibson, the sheer number of possible protein interactions means that a PTM might be active only under a set of very specific conditions. “You could knock out a single phosphorylation site in a mouse and never be able to see any effects. But that [site] might only work in combination with other PTMs when an animal experiences starvation plus dehydration or some other combination of stressful conditions, and you’re never going to see that, because to do so would require an unethical experiment,” he says.
All these dogmatic titles that you see in prominent articles about how the cell works are never true except in a very limited sense.
Gibson expects mixed responses to this openly opinionated article. The estimates that there are so many peptide motifs potentially worth exploring should, at the very least, be encouraging to younger molecular biologists, as it means that they have a certain level of job security over the next few decades. However, he’s keenly aware that accounting for the complexity presented by transient and diverse PTMs is in direct conflict with the assertive language typical of scientific claims that garner press coverage and accolades. “A paper might say, ‘We found out how [the protein] P53 works’, but if you change conditions or change cell type, P53 works very differently,” says Gibson. “All these dogmatic titles that you see in prominent articles about how the cell works are never true, except in a very limited sense,” he continues. “There’s a kind of unholy alliance of editors and big signalling labs who want to keep on making these easy-to-sell statements.” It’s not just cell signalling biologists, either – he believes that many systems biologists and others whose work depends on models of cellular components are also not fond of complexity, because modelling requires that the object or process being studied be stripped to its simplest form just so that the computations are feasible. Gibson thinks this is a fundamental problem because it ignores all the potential outcomes that could happen under different circumstances, which in the biomedical context could, for example, be the difference between a drug that helps patients and one that inadvertently harms them.
Complexity in context
How would Gibson and his colleagues recommend a molecular biologist proceed when facing the daunting prospect of a million peptide motifs? The answer is best summarised by the conclusion of their paper itself: “A careful choice of experimental design to test motifs is crucial, as their sensitive functionality […] may lead to incorrect inferences. Therefore, an important challenge for the community is to not only identify binding motifs and PTM sites, but also functionally characterize these peptide motifs by investigating them in the right biological context.”