The Global Proteome Machine Organization
   GPM Blog
Data set of the week: (2014/7/19)
Development and performance evaluation of an ultra-low flow nano liquid chromatography-tandem mass spectrometry set-up.
Overall rating: excellent data (leading the field)
This data set consisted of 39 results, exploring the role of HPLC flow rate and analysis duration in LC/MS/MS measurements. The data files were made available through ProteomeXchange, PXD000396. It has been published by Köcher T, Pichler P, Pra MD, Rieux L, Swart R, and K Mechtler, Proteomics. 2014 Jun 11 (PubMed).
This data set represents a tour de force exploring the relationship between chromatographic methods and proteomics results. This group has achieved a degree of reproducibility and quality control using their nanoLC system that clearly leads the field. From a technical point of view, many of the LC/MS/MS runs (e.g., 120312QEx2_RS1_20nl-min_0k1HeLa_14h_01.msf) were simply the best we've ever seen in 10 years of operation. Anyone interested in studying the relationship between quality parameters — dynamic range, LOD or LOQ — and the number of spectra acquired should examine this data set carefully.
GO annotation listing build complete for three model species (2014/7/9)
Following the usual quarterly update of the human and mouse proteome guides (v. 15), the listings for human, mouse and yeast GO code annotations were also rebuilt. The resulting files contain the proteins associated with a specific GO code, the number of times the protein has been observed and the usual GPMDB evidence code for these proteins. The GO codes available are indexed and the individual listings are available from the main GPMDB site in three text formats:
  1. H. sapiens, 11,958 GO categories;
  2. M. musculus, 11,339 GO categories; and
  3. S. cerevisiae, 4,309 GO categories.
The text files and HTML indexes are also available for download via FTP.
Data set of the week: (2014/7/7)
Proteomic analysis of the multimeric nuclear egress complex of human cytomegalovirus
Overall rating: very good data (general interest)
This data set consisted of 24 results, using LC/MS/MS to probe the consequences siRNA gene silencing experiments. The data files were made available through ProteomeXchange, PXD000536. It has been published by Milbradt J, Kraut A, Hutterer C, Sonntag E, Schmeiser C, Ferro M, Wagner S, Lenac T, Claus C, Pinkert S, Hamilton ST, Rawlinson WD, Sticht H, Coute Y and Marschall M, Mol Cell Proteomics. 2014 Jun 26 (PubMed).
Human cytomegalovirus — a.k.a, HCMV, CMV and Human herpesvirus 5 — infections are extremely common (> 50% of the population). The virus does not produce clinical symptoms in most of the infected, but it remains dormant for long periods of time and can result in serious disease in immuno-compromised individuals. It can also be passed from the mother to fetus and give rise to developmental abnormalities. This study does a good job of demonstrating the utility of combining proteomics and siRNA techniques for the study of viral protein production. The sample preparation, chromatography and mass spectrometry are well done. Any group interested in studying viral dynamics in host cells using proteomics should take a look at the methods used to generate this data set and the results obtained from these studies.
Data set of the week: (2014/6/29)
Dynamic readers for 5-(hydroxy)methylcytosine and its oxidized derivatives.
Overall rating: very good data (general interest)
This data set consisted of 249 results, using affinity pull-down sample preparation and LC/MS/MS analysis with SILAC quantitation. The data files were made available through ProteomeXchange, PXD000143. It has been published by Spruijt CG, Gnerlich F, Smits AH, Pfaffeneder T, Jansen PW, Bauer C, Münzel M, Wagner M, Müller M, Khan F, Eberl HC, Mensinga A, Brinkman AB, Lephikov K, Müller U, Walter J, Boelens R, van Ingen H, Leonhardt H, Carell T and Vermeulen M, Cell. 2013 Feb 28;152(5):1146-59 (PubMed).
This study utilizes the same methods commonly used to determine protein-protein interactions to determine which proteins have a special affinity for DNA containing 5-methylcytosine, 5-(hydroxy)methylcytosine, 5-formylcytosine and 5-carboxylcytosine. The experiments were performed using mouse embryonic stem cells as the source of potential interactor proteins. The experiments were consistently done and the analysis was of very good quality. The proteins selected showed considerable enrichment of those known to be part of the nucleolus, nucleus, ribonucleoprotein complex, ribosome and spliceosome. This method of sample preparation produced many of the best observations of comparatively rare gene products, such as Pcgf1:p, Aurkc:p and Gm5590:p.
Data set of the week: (2014/6/23)
The Global Phosphoproteome of Chlamydomonas reinhardtii Reveals Complex Organellar Phosphorylation in the Flagella and Thylakoid Membrane.
Overall rating: excellent data (worth study)
This data set consisted of 6 results, using a multi-step phosphopeptide enrichment strategy followed by a multidimensional chromatography separation using a HILIC initial separation and subsequent reversed-phase HPLC. The data files were made available through ProteomeXchange, PXD000783. It has been published by Wang H, Gau B, Slade WO, Juergens M, Li P and Hicks LM, Mol Cell Proteomics. 2014 Jun 10 (PubMed).
Chlamydomonas reinhardtii is a widely used model algae species. It is unicellular with two flagella and it is capable of photosynthesis. The organism is commonly found in the environment and it can be grown under very minimal conditions compared to most eukaryotes. This study used global phosphoproteomics methods to determine how the organism utilizes protein phosphorylation in its metabolic processes. The results showed good enrichment of phosphopeptides (> 70% of identified spectra). The ratio of S:T phosphorylation was a little lower than many other eukaryotes (about 4:1), but the degree of proline-directed phosphorylation detected was noticeably less than normally found in mammalian studies. The data quality was excellent and these spectra would be suitable for developing algorithms or testing computational biology methods for phospho-protein biology.
Data set of the week: (2014/6/15)
A Candida albicans PeptideAtlas.
Overall rating: excellent data (leading the field)
This data set consisted of 148 results, from 16 distinct experiments. The data files were made available through PeptideAtlas, PASS00402, PASS00408, PASS00476, and PASS00447. It has been published by Vialas V, Sun Z, Loureiro y Penha CV, Carrascal M, Abián J, Monteoliva L, Deutsch EW, Aebersold R, Moritz RL, and Gil C, J Proteomics, 2014 Jan 31;97:62-8 (PubMed).
Candida albicans is a fungus that can exist either as single cells or filaments. It is a commensal organism in H. sapiens, occupying the oral cavity and gastrointestinal tract in most of the population. C. albicans can also cause a variety of infections —particularly in oral and gential tissues — in immunocompromised individuals. It belongs to a large group of fungi, the mitosporic Saccharomycetales, that that contains many human pathogenic organisms. Unfortunately, these fungi have not had much attention from the proteomics community. This dataset starts to correct this problem, defining the observable peptides and proteins from C. albicans samples under a variety of experimental conditions. The sample preparation and separations were very well done and the mass spectrometry was state-of-the-art.
Data set of the week: (2014/6/7)
Functional annotation of proteome encoded by human chromosome 22; and
A draft map of the human proteome.
Overall rating: excellent data (general interest)
This data set consisted of 84 results, each one a summary of individual LC/MS/MS runs associated with multidimensional chromatography analyses of individual tissue samples. The data files were made available through ProteomeXchange, PXD000561. It has been published by Pinto SM, Manda SS, Kim MS, Taylor K, Selvan LD, Balakrishnan L, Subbannayya T, Yan F, Prasad TS, Gowda H, Lee C, Hancock WS, and Pandey A, J Proteome Res. 2014 Jun 6;13(6):2749-60 (PubMed).
This set of data was one of the first attempts to broadly sample human tissues using similar experimental methods for each sample. It contained some of the first publicly available data for several tissues, in some cases from both fetal and adult samples. Analysis of the data produced numbers of protein identifications typical for the methods used, although the results for some tissues (e.g., liver, heart) were surprisingly variable. Overall, the chromatography and mass spectrometry were well done and consistent between samples. There was considerable variablity between the samples with respect to the presence of detectable experimental artifacts caused by the modification of free peptide amines: both N-terminal and lysine side chain amines were either carbamylated or carboxyamidomethylated to a significant extent. These artifacts made the data of limited use for detecting some modifications — particularly acetylation or ubiquinatiion — or amino acid polymorphisms. Other modifications that were not easily confused with these artifacts were present and available for interpretation. For example, differences in the hydroxyproline distributions on many collagen subunits could be readily observed in different tissues. The phosphorylation states of some common proteins could also be readily observed across multiple tissues.
The Contest: testing large-scale proteomics information systems
In addition to the publication listed above, this data was also the basis for "A draft map of the human proteome", describing the web site The purpose of the web site was to allow researchers to enter a list of gene symbols and then display the relative amount of the associated protein that was detected in each of the tissues examined. The data was analyzed using methods commonly used for small, single LC/MS/MS runs applied to these much larger data sets.
One of the best ways to evaluate this type of informatics system is to perform "sanity" tests to see how well the output of the system corresponds to known patterns of protein expression. Since evaluating this type of system is an important skill for anyone who wants to be involved in large-scale proteomics, we thought it would be an excellent subject for a contest. Two lists of genes were selected to probe the quality and utility of the proteomics information available and the results of querying with these lists were downloaded as PDF files:
Everyone with an interest in the subject is invited to take a look at these two results and write a 250 word essay on their implications for the biological, technical and biomedical utility of the web site's information. The best essay will be published on this blog (anonymously if you prefer) and the author will recieve a beautiful GPMDB T-shirt. Submit your entries by email to, using the subject line "GPMDB T-Shirt contest". Please stick to the facts as much as possible: sarcasm, irony or ad hominem comments will count against any entry. Entries may be submitted until midnight July 1, 2014. Multiple entries from the same individual are allowed, but the author must clearly identify themselves in the email. The winner will be announced July 7, 2014.
Data set of the week: (2014/5/25)
Virion proteome of Cafeteria roenbergensis virus strain BV-PW1.
Overall rating: excellent data (general interest)
This data set consisted of 10 results, consisting of 9 gel bands and a summary set of identifications. The data files were made available through ProteomeXchange, PXD000993. It has been not yet been published, but was submitted by Matthias Fischer (Max Planck Institute for Medical Research) and Leonard Foster (University of British Columbia).
This elegant dataset neatly wraps up the preliminary work on the proteome of a recently discovered nucleocytoplasmic large DNA virus, the Cafeteria roenbergensis virus. The host species, Cafeteria roenbergensis, is a marine flagelate that consumes bacteria in coastal water. The virus has a very large genome of about 730,000 base pairs of dsDNA and 1,096 predicted proteins. The virus is also large enough that it can be infected by a virophage, the Mavirus. Not only is the virus biologically interesting, but the data is one of the best we've run across for testing peptide identification algorithms and the theory behind them. The chromatography and mass spectrometry were both very well done and the spectra are ideal for detecting common artifactual modifications that can be masked by dodgy experimental technique, such as deamidation and peptide N-terminus cyclization. It is also useful for trying to understand how to think about the problem of balancing sensitivity versus selectivity and false positive versus false negative assignments.
Copyright © 2013, The Global Proteome Machine Organization. Privacy Statement