This is the single most frequent question we get, and the hardest one to answer. The boring (but correct) answer is that this depends on the sample, and on the question that you want to address. If you want to identify a single protein from gel, an amount that produces a coomassie-stainable band will almost always give you a protein identification (as long as your protein exists in a database that we can search) ─ this should be in the range of 10─20 ng. In general, always send us as much as you can spare. If you think of sending half of your sample, which gave you a faint coomassie band, consider what you would do with the remaining half ─ it might be better to send us everything from the start. For identification of proteins in complex mixtures (e.g. separated over a gel lane), the amount of protein should be upwards from 10 µg.
To determine the weight of intact proteins, you should send at least 10 µg at a concentration not lower than 1 mg/ml. Usually this is sufficient to get a molecular weight determined under denaturing conditions. Another very critical factor is the buffer composition (see below).
In both cases, addition of large volumes of water or buffer is not required – having them moist is sufficient. Also, there is no need to ship them on ice. For MW determination of intact proteins, the buffer composition is not highly critical, as long as it does not contain detergents or more than 5% glycerol.
Protein identification by mass spectrometry is probabilistic, meaning that the best match is sought between an experimental spectrum and a theoretical spectrum for a peptide in the database. The score assigned to this match, and therefore the probability for the match to be right, depends on a number of parameters, such as spectral quality, mass accuracy, the size of the database, and the algorithm used for database searching. In addition, the more peptides are assigned to a given protein, the higher the protein score will be. For the identification of single proteins, we report an E-value indicating the likelihood that the identification was generated by chance. For large datasets, we report a false discovery rate, indicating the estimated number of wrong identifications in the entire dataset (usually <1%).
We identify proteins by matching spectra to protein sequences in a database. Thus, in principle, if a protein is not in the database it will not be identified. Therefore full genome sequences are very helpful for protein identification, even if the protein as such has never been observed before. For organisms with un-sequenced genomes, the number of known proteins (or genes) can be far from complete, seriously hampering identification of novel proteins. We can search from DNA databases, so any additional information (partial genome sequences, or initial attempts for genome annotation, e.g. scaffolds) may be used to improve our chances. Remember that protein identification by MS is NOT a BLAST search, and thus proteins cannot be identified ‘by homology’: even a single amino acid substitution will change the mass of a peptide, thus prohibiting its identification. In some cases, we can perform peptide de novo sequencing, either alone or in combination with a database search for higher confidence. Please remember that de novo sequencing will not piece back together the positions of the peptides within the protein – without a database sequence, this information is lost once the protein is digested into peptides.
In principle we are set up to do this, although detection of phosphopeptides and site-localisation of the modification is never straightforward. First, phosphorylation often is substoichiometric, meaning that phosphopeptides can be very low-abundant, and that they may even go unnoticed among non-phosphorylated peptides from the same or other proteins. Second, ionization of phosphopeptides is less efficient than for ‘normal’ peptides, which also does not help to detect them. Third, fragmentation of phosphopeptides does not obey the same rules that apply to normal peptides, which sometimes makes spectra hard to interpret. Finally, the phosphorylation site may be present in a peptide that is too large or too small to be detected and fragmented efficiently. If you know your protein, and the expected modification site, the choice of a different protease may be a good alternative for mapping the desired domain. As a consequence of all of this, it is not uncommon that we can tell a peptide is phosphorylated, but that it is difficult to pinpoint at exactly what site. Enrichment of phosphopeptides (actually the process involves depleting the sample of non-phosphorylated peptides) by techniques as IMAC or TiO2 is often helpful, especially for complex samples.
We can identify other PTMs – it will be helpful if you tell us which one(s) you expect to occur – we can then look for this specifically. Some are easier to detect than phosphorylation, because they tend to be more stable in the mass spectrometer (e.g. acetylation, methylation). Looking for any of the ~300 known PTMs creates a combinatorial problem, usually decreasing the likelihood of finding any one of them with high confidence. Nevertheless, there may be ways to approach this in an iterative way, thus identifying unexpected modifications.
If you excised a single band from a gel, it is not unlikely that this contains several proteins of (almost) the same size, some of which may be below the detection level of the staining method used. Mass spectrometers by far exceed the sensitivity of coomassie and silver staining.
Most likely they were introduced during sample preparation, or (in case you used gels) during staining or cutting. Dust in the lab is the most likely source, so make sure you work in a clean area. In the User Guide we have some suggestions how to minimise contamination.
Right, that’s embarrassing – or maybe not. Actually, there may be several reasons why this might occur. The most likely reason is that the amount of starting material was simply not enough. Other reasons might be that the protein does not contain (a sufficient number of) cleavage sites for trypsin, our work-horse protease. As a result, no peptides will be generated (and detected). This may be the case for some small proteins, but this may also occur to ‘exotic’ (e.g. highly acidic) proteins that contain fewer lysines and arginines than the average protein. The reverse might also be true: if a protein contains multiple cleavage sites, it will be digested into peptides that are too small to be detected, or to be sequenced with high confidence (e.g. histone tails). The alternative might be in the selection of another protease, which may viable if you know the protein you are working with. Another reason for not finding a protein may be that it is not present in the database. This is not unusual for proteins originating from poorly characterized organisms. Finally, if you coomassie-stained your gel in a microwave or scanned on an overhead projector foil, these are the most likely reasons for not finding anything: microwaving bakes proteins in the gel, and there is no way to get them out, however intense the stain. Polymers in the overhead projector foils cause trypsin not to work. Therefore, do not use a microwave to speed up staining or destaining, and only scan on glass plates (see also the User Guide).
Yes, we support full workflows using various stable isotope labeling strategies, including SILAC, TMT and dimethyl labeling, including data analysis. The latter two are peptide-based and are offered as a service by the Facility. SILAC labels are introduced during cell culture, which is typically carried out by the user/biologist before submitting samples to the Facility. If you are considering SILAC labeling, we are happy to advise in setting up and optimising the experiment – e.g. to verify by mass spectrometry, full incorporation of your heavy amino acids, before a mixing experiment with different conditions is carried out.
The user can expect a full and detailed analysis of the raw MS data. A typical analysis starts with assessing batch effects and proper data normalization in order to ensure comparability between the samples. Then, a differential expression analysis is carried out to identify genes which are significantly up or downregulated between two conditions. Depending on the user, the hits could also be analyzed using GO-term enrichment or network analysis with cytoscape. Furthermore, multiple graphical and numeric displays are provided in order to communicate findings and archive results. Finally, the whole analysis is described in an R markdown file to ensure reproducibility and transparency. We always try to provide a personalized data analysis dedicated to the individual requirements of the respective project.