The GPM Data Set of the Week, 2011

The Global Proteome Machine Organization

GPMDB Data set of the week

The GPMDB contains tens of thousands of data sets contributed by researchers around the world. Every week, we select a data set because of its technical excellence, biological interest or simply because we think it is of general interest to the proteomics community. For an explanation of what the stars mean, click here:

By year posted

2014 | 2013 | 2012 | 2011 | 2010

Data sets of the year (2011/12/31)
Technical, Biological and Clinical.

This week we are highlighting the three finest examples of proteomics data made public in 2011. As we did last year, we are naming the best data in three categories.

Technical data: Frese CK, et al.
Improved Peptide Identification by Targeted Fragmentation Using CID, HCD and ETD on an LTQ-Orbitrap Velos.
This study did an excellent job of comparing and contrasting the utility of the three most popular parent ion fragmentation mechanisms in proteomics. Examining the results are a must for anyone interested in making the right choice for their experiments.
Biological data: Kim W, et al.
Systematic and quantitative assessment of the ubiquitin-modified proteome.
Many groups have tried to investigate ubiquitination, but this study was the first to really get it right. All of the aspects of the experiments were well done and the data was truly first rate.
Clinical data: Marimuthu A, et al.
A comprehensive map of the human urinary proteome.
This study really provides a basis for the proteomics analysis of human urine. The proteins and peptides found here demonstrate the state-of-the-art in what can be detected in this important clinical sample.

Data set of the week: (2011/12/19)
Virus-induced dilated cardiomyopathy is characterized by increased levels of fibrotic extracellular matrix proteins and reduced amounts of energy-producing enzymes.
Overall rating: two stars - very good data (general interest)

two stars - very good data (general interest)

This data set consisted of 91 LC/MS/MS runs from two dimensional SDS-PAGE spots. The data was published by Nishtala K, Phong TQ, Steil L, Sauter M, Salazar MG, Kandolf R, Kroemer HK, Felix SB, Völker U, Klingel K and Hammer E in Proteomics 2011 11:4310-20 (PubMed).

This data is a good example of what can be done using 2D-SDS PAGE DIGE methods when coupled with high resolution mass spectrometry-based protein identifications. The analysis showed a small number of proteins per spot, with good clustering of predicted molecular masses (from the protein sequence) in each sample spot. There was very signficant contamination of all of the samples with common adventious proteins (H. sapiens KRT1, KRT2, KRT9 and KRT10; B. taurus α- & κ-casein; and S. scrofa trypsin). The high levels of these proteins made some of the data analysis a bit tricky: the porcine trypsin in particular contained one peptide that was consistently identified as being from mouse Try10 while it clearly was from the porcine reagent instead. It would be helpful to the entire field if more effort was put in to preventing the contamination of polyacrylamide gels.

Data set of the week: (2011/12/12)
Selected reaction monitoring mass spectrometry reveals the dynamics of signaling through the GRB2 adaptor.
Overall rating: two stars - very good data (general interest)

This data set consisted of 5 LC/MS/MS runs from affinity purification experiments. The data was published by Bisson N, James DA, Ivosev G, Tate SA, Bonner R, Taylor L, Pawson T in Nat Biotechnol. 2011 29:653-8 (PubMed).

The five analyses presented here are a good example of the type of MS/MS identification work that is necessary when setting up a solid SRM/MRM assay for quantitation. There are several good replicates to establish reproducibility and the MS/MS spectra were generated on the same type of instrument used to perform the quantitative analysis. The group also paid careful attention to the chromatography used, which is an under-appreciated necessity for this type of quantitation.

Data set of the week: (2011/12/05)
Phosphoproteomic analysis of Salmonella-infected cells identifies key kinase regulators and SopB-dependent host phosphorylation events.
Overall rating: four stars - excellent data (leading the field)

four stars - excellent data (leading the field)

This data set consisted of 9 LC/MS/MS runs collected using metal oxide capture methods. The data was published by Rogers LD, Brown NF, Fang Y, Pelech S, Foster LJ in Sci Signal. 2011 4:rs9 (PubMed).

The results derived from this data really show the state-of-the-art when using an Orbitrap with CID and SILAC quantitation to follow the changes in phosphorylation patterns that occur during a biological event (in this case Salmonella infection in human cells). All aspects of the measurement (sample preparation, phosphopeptide enrichment, HPLC and mass spectrometry) were performed with excellent attention to detail and quality. Any one interested in developing new ways of handling quantitative proteomics data while simultaneously following a post-translational modification should use these experiments as a model system for testing their methods.

Data set of the week: (2011/11/27)
A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease.
Overall rating: three stars - excellent data (worth study)

three stars - excellent data (worth study)

This data set consisted of 269 LC/MS/MS runs collected from multiple replicate runs of human plasma samples. The data was published by Addona TA, Shi X, Keshishian H, Mani DR, Burgess M, Gillette MA, Clauser KR, Shen D, Lewis GD, Farrell LA, Fifer MA, Sabatine MS, Gerszten RE, and Carr SA. in Nat Biotechnol. 2011 29:635-43 (PubMed).

This data represents the maturing of proteomics measurements into a clinical tool. The experiments were performed using state-of-the-art techniques and allow the in-depth profiling of the proteins present in clinically-derived plasma samples for the differential diagnosis of cardiovascular events. The combination of good, solid experimental technique in the plasma measurements in combination with SRM/MRM methods for more routine monitoring is probably the pattern many clinically-oriented studies will follow for the next few years.

Data set of the week: (2011/11/20)
Systematic and quantitative assessment of the ubiquitin-modified proteome.
Overall rating: four stars - excellent data (leading the field)

This data set consisted of 90 LC/MS/MS runs collected from a series of multidimensional chromatography experiments, using SILAC methods for quantitation. The data was published by Kim W, Bennett EJ, Huttlin EL, Guo A, Li J, Possemato A, Sowa ME, Rad R, Rush J, Comb MJ, Harper JW, and Gygi SP. in Mol Cell. 2011 44(2):325-40 (PubMed).

The experiments that generated this data used affinity purification to select peptides that had been modified by ubiquination. The antibody used recognized the unusual addition of Gly-Gly to the sidechain of lysine, which only occurs in tryptic peptides generated from ubiquinated proteins. There have been many studies that used this modification (+114 Da) to identify ubiquitination sites, but these particular experiments have the largest (and most broadly distributed) set of identified modified lysines in human proteins currently available. The use of the proteosome inhibitor bortezomib created significantly higher concentrations of these modified peptides in the cell culture, allowing the antibody pull-down method to be much more effective than it would have been in untreated cells.

Data set of the week: (2011/11/14)
Comparative phosphoproteome profiling reveals a function of the STN8 kinase in fine-tuning of cyclic electron flow (CEF).
Overall rating: three stars - excellent data (worth study)

This data set consisted of 8 result sets, colllected from IMAC/TiO₂ affinity measurements. The data was published by Reiland S, Finazzi G, Endler A, Willig A, Baerenfaller K, Grossmann J, Gerrits B, Rutishauser D, Gruissem W, Rochaix JD, and Baginsky S. in Proc Natl Acad Sci U S A. 2011 108:12955-60 (PubMed).

These results contain some of the best plant phosphorylation information available. The experiments were very well planned and the analysis was done carefully. Many of the phospho-domains were previously undocumented and the data was analyzed in a reasonable manner for the resulting manuscript.

Data set of the week: (2011/11/07)
A protein epitope signature Tag (PrEST) library allows SILAC-based absolute quantification and multiplexed determination of protein copy numbers in cell lines.
Overall rating: three stars - excellent data (worth study)

This data set consisted of 138 result sets. The data was published by Zeiler M, Straube WL, Lundberg E, Uhlen M, and Mann M. in Mol Cell Proteomics. 2011 Sep 30 (PubMed).

The data provided by these experiments is a tremendous resource for anyone interested in proteomics search engine development, testing or statistical analysis. The first 107 LC/MS/MS runs were generated using individual SILAC-labelled PrEST peptides. There are effectively no contaminants, making these spectra excellent examples to use for determining algorithm sensitive and noise rejection. The remaining sets were large, high quality measurments of mixtures of either normal PrESTs and SILAC heavy HeLa proteins or SILAC heavy PrESTs and normal HeLa proteins. The multiple replicates and well-characterized samples make these runs perfect for determining statistical error rates and comparing predictions from theoretical distributions to laboratory data.

Data set of the week: (2011/10/30)
Proteome-wide mapping of the Drosophila acetylome demonstrates a high degree of conservation of lysine acetylation.
Overall rating: one star - very good data (specialist interest)

one star - very good data (specialist interest)

Department of Proteomics, The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen

This data set consisted of 46 LC/MS/MS runs, that were enriched in acetylated lysine. The data was published by Weinert BT, Wagner SA, Horn H, Henriksen P, Liu WR, Olsen JV, Jensen LJ, and Choudhary C. in Sci Signal. 2011 4:ra48 (PubMed).

The MS/MS data generated for this paper was first-rate, using Higher-energy Collisional Dissociation (HCD) and high accuracy fragment ion mass measurement to produce a large set of excellent Drosophila melanogaster peptide identifications. This sort of data would normally receive a better rating than a single étoile. However, for some reason the investigators choose to use urea as part of their experiment sample workup, leading to an observable amount of lysine carbamylation in their proteins. The presence of these carbamylations (Lys + 43 Da) makes unambiguously determining acetylation (Lys +42 Da) much more difficult than would have been necessary if a urea-free sample workup protocol had been utilized. Given this proviso, if sufficient care is taken the high accuracy mass measurements used to create the data allow the assignment of a large number of acetylated domains in the proteins identified.

Data set of the week: (2011/10/23)
A phospho-proteomic screen identifies substrates of the checkpoint kinase Chk1.
Overall rating: three stars - excellent data (leading the field)

three stars - excellent data (leading the field)

This data set consisted of 2 LC/MS/MS runs, using a covalent phosphopeptide capture method. The data was published by Blasius M, Forment JV, Thakkar N, Wagner SA, Choudhary C, and Jackson SP in BMC Syst Biol. 2011 5:68 (PubMed).

Any one interested in targeted phosphopeptide analysis should look at this data carefully. The methods used here generated identifications that were > 99% phosphopeptides, for the very specific proteins of interest in the cell-cycle checkpoint kinase Chk1 system. Every aspect of the measurements was done well, while collecting a very small number of spectra compared to other techniques. Even though there are relatively few spectra, there were a surprising number that were either unique or the best obtained for that particular sequence.

Data set of the week: (2011/10/16)
Global network analysis of drug tolerance, mode of action and virulence in methicillin-resistant S. aureus.
Overall rating: one star - very good data (specialist interest)

This data set consisted of 10 LC/MS/MS runs, using iTRAQ quantitation. The data was published by Overton IM, Graham S, Gould KA, Hinds J, Botting CH, Shirran S, Barton GJ, and Coote PJ in BMC Syst Biol. 2011 5:68 (PubMed).

The data collected here was for a focussed study which was well suited to analysis using a QQ-TOF style instrument and isobaric tags for relative and absolute quantitation. Using the results the authors were able to draw some conclusions about changes in the concentrations of the most abundant proteins in S. aureus, caused by their specific experimental conditions. The protein concentration limit of detection was significantly higher than might be expected for a survey-style proteomics study but in this case it was the perturbations in metabolic proteins that was desired measurement, rather than a thorough catalogue of all proteins present.

Data set of the week: (2011/10/9)
DNA affects the composition of lipoplex protein corona: A proteomics approach.
Overall rating: two star - very good data (general interest)

two star - very good data (general interest)

This data set consisted of 2 LC/MS/MS runs, using label-free quantitation. The data was published by Capriotti AL, Caracciolo G, Caruso G, Foglia P, Pozzi D, Samperi R, and Laganà A in Proteomics. 2011 11:3349-58 (PubMed).

This data was a nice demonstration of the use of protein isolation methods to generate a much-reduced set of proteins (compared to blood plasma) associated with a very specific biomedically-relevant stimulus. The identifications were sound and the overall experimental setup produced a good set of appropriate peptides for the proteins found in this study, all of which are well-known plasma proteins.

Data set of the week: (2011/09/18)
Shotgun proteomic analysis of the unicellular alga Ostreococcus tauri.
Overall rating: exceptional achievement (leading the field)

exceptional achievement (leading the field)

This data set consisted of 235 result sets, corresponding to normal peptides, phosphopeptides and 15N labelled SILAC experiments. The data was published by Le Bihan T, Martin SF, Chirnside ES, van Ooijen G, Barrios-Llerena ME, O'Neill JS, Shliaha PV, Kerr LE, and Millar AJ. in J Proteomics. 2011 74:2060-70 (PubMed).

This paper does an excellent job of characterizing the proteome of a very unusual eukaryote, Ostreococcus tauri. Discovered in 1994, it is still the smallest known eukaryote in size — at 0.8 microns in diameter, 1000 O. tauri cells would fit in a HeLa cell, with plenty of room left over. This data set thoroughly examines the proteome of the organism, which has significant sequence divergence from the model eukaryotes commonly used in proteomics experiments. Any group interested in the molecular evolution of phosphorylation signalling should find their phosphopeptide isolations instructive. This data holds the modern record for the shear volume of tryptic peptide sequences that had never been observed before these spectra became publicly available. The methods used here should serve as a guide for anyone interested in characterizing the proteome of a novel, single-celled eukaryote.

Data set of the week: (2011/09/11)
Quantitative phospho-proteomics to investigate the Polo-like kinase 1-dependent phospho-proteome.
Overall rating: three stars - excellent data (worth studying)

three stars - excellent data (worth studying)

This data set consisted of 27 LC/MS/MS runs, each corresponding to an SCX fraction from an IMAC enrichment of acidic peptides. The data was published by Grosstessner-Hain K, Hegemann B, Novatchkova M, Rameseder J, Joughin BA, Hudecz O, Roitinger E, Pichler P, Kraut N, Yaffe MB, Peters JM, and Mechtler K. in Mol Cell Proteomics. 2011 Aug 21 (PubMed).

What separated this study from other surveys of HeLa cell phosphopeptides was the use of a SILAC approach that has significant benefits. Rather than relying on metabolic incorporation of heavy amino acids, this study used light and heavy methyl groups, added to the acidic groups of the cleaved peptides (Glu, Asp and C-terminus). This treatment blocked all of the acidic groups in these peptides, except for the phosphorylated Ser, Thr and Tyr residues. Because of this protocol, the IMAC enrichment produced an unusually pure set of phosphopeptides that were not dominated by peptides containing additional acidic side chains, as is often the case with IMAC experiments. It also generated particularly simple, accurate peptide quantitation.

Data set of the week: (2011/09/04)
Proteomic analysis of outer membrane vesicles derived from Pseudomonas aeruginosa.
Overall rating: one star - very good data (specialist interest)

This data set consisted of 4 groups of spectra, one large scale survey run and three small separate analyses. The data was published by Choi DS, Kim DK, Choi SJ, Lee J, Choi JP, Rho S, Park SH, Kim YK, Hwang D, Gho YS. in Proteomics 2011 11:3424-9 (PubMed).

The data reported here gives a first look at the outer membrane proteins of this important pathogenic species. The proteins discovered and the techniques used provide an excellent comparison with the proteins found for the related species, Pseudomonas syringae, in a previously featured data set. The results would have been more broadly applicable at the peptide level if the chromatography had been better, but the proteins identified were based on very good ion-trap spectra and the data analysis used in the manuscript was appropriate.

Data set of the week: (2011/08/29)
A tissue-specific atlas of mouse protein phosphorylation and expression.
Overall rating: one star - very good data (specialist interest)

This data set was made available in TRANCHE as 312 LC/MS/MS runs using metal oxide affinity to enrich fractions with phosphopeptides from mouse tissue samples. The data was published by Huttlin EL, Jedrychowski MP, Elias JE, Goswami T, Rad R, Beausoleil SA, Villén J, Haas W, Sowa ME, and Gygi SP. in Cell. 2010 143:1174-89 (PubMed).

The data gives a general survey of the most abundant phosphopeptides that were found in nine different mouse tissue samples. The phosphopeptide enrichment was lower than in other, more specific studies and the chromatography was somewhat less consistently performed than has become best-practice in the field. The study did, however, provide many good observations of phosphorylation sites in proteins that are not well-represented in cell culture studies.

Data set of the week: (2011/08/21)
Quantitative phosphoproteomics identifies substrates and functional modules of Aurora and Polo-like kinase activities in mitotic cells.
Overall rating: three star - excellent data (worth studying)

three star - excellent data (worth studying)

This data set was made available in TRANCHE as 100 LC/MS/MS runs that use a combination of SILAC and metal oxide affinity purification methods. The data was published by Kettenbach AN, Schweppe DK, Faherty BK, Pechenick D, Pletnev AA, and Gerber SA in Sci Signal. 2011 Jun 28, 4(179):rs5 (PubMed).

This paper provides a good survey of the phosphopeptides present in HeLa cells and should be viewed as a model for further study of quantitative phophoproteomics in cell culture. The experimental analysis used CID fragmentation and it demonstrates very clearly that it is not necessary (or desirable) to use ETD when looking for sensitive, reproducible phosphopeptide quantitation. The data analysis in the paper has some flaws, but the conclusions were reasonable and within the limitations of the analytical approach that was used.

Data set of the week: (2011/08/14)
Proteome profiling of wild type and lumican-deficient mouse corneas.
Overall rating: three star - excellent data (worth studying)

This data set was made available as 48 LC/MS/MS runs from a series of MudPit experiments. The data was published by Shao H, Chaerkady R, Chen S, Pinto SM, Sharma R, Delanghe B, Birk DE, Pandey A, and Chakravarti S in J Proteomics. 2011 May 17 (PubMed).

These experiments truly answered the question: "What proteins are present in mouse corneas?" It contains excellent observations of many not-so-common collagens, keratins and a variety of other proteins associated with intermediate filaments, such as desmoplakin, periplakin, envoplakin and uroplakin. The original data analysis presented in the paper was very deeply flawed: it should not be considered reliable. The data itself, though, was an excellent example of the benefits of using an Orbitrap-LTQ hybrid instrument with a sensitive HCD collision cell.

Data set of the week: (2011/08/08)
Proteomic analysis of microvesicles derived from human colorectal cancer ascites.
Overall rating: two star - very good data (general interest)

This data set was made available as 3 summary sets created from a combination of 1-D SDS-PAGE gel bands and LC/MS/MS runs. The data was published by Choi DS, Park JO, Jang SC, Yoon YJ, Jung JW, Choi DY, Kim JW, Kang JS, Park J, Hwang D, Lee KH, Park SH, Kim YK, Desiderio DM, Kim KP, and Gho YS in Proteomics 2011 11:2745-51 (PubMed).

The experiments performed here provide about as much information as can be obtained from a clinically obtained sample — in this case ascities from human colorectal cancer patients — using gel band analysis and an LTQ mass spectrometer. The identifications were good quality and they provide a good template for the proteins to be expected in the micro-vesicular fraction of this class of clinical isolates. The results were relatively free of artifacts and comparision of the three isolates provides an interesting example of the variability that can be expected from real samples related only by their method of isolation.

For anyone interested, these three result sets can be used to compare the utility of a purely web-based system (GPMDB) with a local client computer app (PRIDE's new PRIDE Inspector utility). To use PRIDE Inspector, click on the "PRIDE" link for any of the three data sets and then click on the red "PRIDE Inspector" link on the resulting page. You will need to have Java installed on your computer (this will not work on most smart phones or iPad tablets).

Data set of the week: (2011/07/31)
Global profiling of proteolysis during rupture of Plasmodium falciparum from the host erythrocyte.
Overall rating: one star - very good data (specialist interest)

This data set was made available as 760 gel band identifications, where each GPM model is the analysis of an individual gel band. The data was published by Bowyer PW, Simon GM, Cravatt BF, and Bogyo M. in Mol Cell Proteomics. 2011, 10:M110.001636 (PubMed).

This study generated a large number of gel bands from a critical point in the life cycle of the protozoan parasite Plasmodium falciparum in the context of its normal home for the part of its life cycle as the causitive agent of malaria, the human erythrocyte. The results provide insights into the organism's metabolism as it exists as a schizont containing multiple merozoites (inside of a erythrocyte) and the subsequent rupturing of the infected erythrocyte. The data provides an excellent example of the bioinformatics challenges associated with the analysis of multi-proteome samples, even when they are nicely isolated into gel bands and the proteomes have little sequence overlap.

Data set of the week: (2011/07/24)
in vivo versus in vitro protein abundance analysis of Shigella dysenteriae type 1 reveals changes in the expression of proteins involved in virulence, stress and energy metabolism.
Overall rating: one star - very good data (specialist interest)

This data set was made available as 19 MudPIT experiments, where each GPM model is a summary of all the individual LC/MS/MS runs. The data was published by Kuntumalla S, Zhang Q, Braisted JC, Fleischmann RD, Peterson SN, Donohue-Rolfe A, Tzipori S, and Pieper R in BMC Microbiol. 2011 11:147 (PubMed).

These experiments provided the most comprehensive collection of peptide identifications for the important pathogenic enterobacteria species Shigella dysenteriae, a close relative of the common Escherichia coli. Type 1 S. dysenteriae causes a severe form of dysentery referred to as shigellosis. The experiments reported here use whole cell lysates to try to understand protein abundances using label-free methods. The proteins found showed significant cleavage at non-tryptic sites (up to 10% of identified peptides), probably caused by endogenous proteases in the lysate itself rather simple chymotryptic activity in the cleavage reagent used. The peptide identifications also revealed extensive deamidation of both Q and N residues.

Data set of the week: (2011/07/17)
Glycoprotein capture and quantitative phosphoproteomics indicate coordinated regulation of cell migration upon lysophosphatidic acid stimulation.
Overall rating: two stars - very good data (general interest)

This data set was made available as 70 LC/MS/MS runs, corresonding to various affinity purification and quantitation schemes. The data was published by Mäusbacher N, Schreiber TB, and Daub H. in Mol Cell Proteomics. 2010 9:2337-53 (PubMed).

These experiments demonstrate the value of using a multiple-step affinity purification strategy to investigate molecules of interest. Here the authors use a combination of lectins to capture glycoproteins and titanium oxide to capture highly acidic peptides. These peptides allowed them to investigate cell surface protein responses to lysophosphatidic acid treatment. The set of peptides captured were quite different from a typical metal-oxide pulldown experiment, as the intracellular proteins with large numbers of high occupancy phopho-domains that tend to dominate the results were mainly absent (such as the usual suspects SRRM2, P53BP1, TRIM28, MAP1A, NPM, et fratres eorum). These high abundance phosphoproteins do not have the necessary glycosylation to have been pulled-down in the first step and therefore they were almost completely removed. This simple purification procedure allowed the reliable detection and quantitation of relatively low occupancy phospho-domains, such as those in WNK1, PTPRK and DTX3L.

Data set of the week: (2011/07/10)
A high-quality catalog of the Drosophila melanogaster proteome.
Overall rating: one star - very good data (specialist interest)

This data set was made available as 1,907 LC/MS/MS runs, through the PeptideAtlas data repository. The data was published by Brunner E, Ahrens CH, Mohanty S, Baetschmann H, Loevenich S, Potthast F, Deutsch EW, Panse C, de Lichtenberg U, Rinner O, Lee H, Pedrioli PG, Malmstrom J, Koehler K, Schrimpf S, Krijgsveld J, Kregenow F, Heck AJ, Hafen E, Schlapbach R, and Aebersold R. in Nat Biotechnol. 2007, 25:576-83 (PubMed).

The work was one of the best of the once popular attempts to create a full-body proteome atlas of an organism. In this case a model organism of historical interest, the fruit fly, was used and a large number of Thermo LTQ and LCQ Classic runs were recorded. While an achievement at the time (only 5 years ago), the relatively small number of identifications obtained per run and the very small amount of quantitative information available makes this study seem a little dated. However, it still provides quite a bit of insight about the most abundant proteins present in D. melanogaster and a general overview of those proteins' relative concentration in a variety of organs and developmental stages, such as larvae, pupa membranes, adult heads, adult membranes, adult membranes, and adult brains.

Data set of the week: (2011/07/04)
A cost-benefit analysis of multidimensional fractionation of affinity purification-mass spectrometry samples.
Overall rating: one star - very good data (specialist interest)

This data set was made available as 105 LC/MS/MS runs, organized by the specific experimental techniques used. The data was published by Dunham WH, Larsen B, Tate S, Badillo BG, Goudreault M, Tehami Y, Kislinger T, and Gingras AC in Proteomics. 2011, 11:2603-12 (PubMed).

These experiments were performed to provide a systematic evaluation of the use of several common sample preparation/separation techniques for the analysis of the type of affinity purified samples commonly used to determine protein-protein interaction partners. In this type of experiment the total number of proteins identified has to be carefully balanced against the background level proteins present due to non-specific protein interactions. The authors do a careful job of applying common methods and studying the results provides a number of interesting case studies that can be used in both planning experiments and teaching practitioners (even experienced ones) about the intricacies of this important class of samples.

Data set of the week: (2011/06/27)
Accurate quantification of more than 4000 mouse tissue proteins reveals minimal proteome changes during aging.
Overall rating: two stars - very good data (general interest)

This data set was made available as 119 data files, organized by the tissue sampled. The data was published by Walther DM, and Mann M. in Mol Cell Proteomics. 2011 10:M110.004523 (PubMed).

This study is a large, multiple tissue examination of the effects of aging on the proteome of M. musculus. The results give a very good survey of the distributions of proteins that can be studied by whole mouse SILAC in a set of tissues: heart, kidney, cerebellum, frontal cortex, and hippocampus. The interesting finding of the study was that there was little quantitative change in the proteins found: aging seems to be a more subtle effect than can be accounted for by gross changes in a tissue's proteome composition.

Data set of the week: (2011/06/19)
Large scale phosphoproteome profiles comprehensive features of mouse embryonic stem cells.
Overall rating: three stars - excellent data (worth studying)

This data set was made available as 12 large experiments. The data was published by Li QR, Xing XB, Chen TT, Li RX, Dai J, Sheng QH, Xin SM, Zhu LL, Jin Y, Pei G, Kang JH, Li YX, and Zeng R. in Mol Cell Proteomics. 2011 10:M110.001750 (PubMed).

When the authors referred to their study as "Large scale", they were not kidding. The data made available rather thoroughly captures the proteins and peptides that can be observed using current technology from whole cell lysates of mouse embryonic stem cells. The identifications were very high quality and the chromatography was consistent. The only small flaw was the trypsin used: it cleaved bonds between K-P, R-P and H-X more frequently than one might hope in a study of this sort. It is not uncommon that trypsin will cleave these non-cannonical sites, but the frequency of this type of cleavage in this study was unusually high.

Data set of the week: (2011/06/13)
A comprehensive map of the human urinary proteome.
Overall rating: three stars - excellent data (worth studying)

This data set was made available as three (3) multidimensional chromotography experiments, resulting in 28 analysis sets, including 3 summary runs. The data was published by Marimuthu A, O'Meally RN, Chaerkady R, Subbannayya Y, Nanjappa V, Kumar P, Kelkar DS, Pinto SM, Sharma R, Renuse S, Goel R, Christopher R, Delanghe B, Cole RN, Harsha HC, and Pandey A. in J Proteome Res. 2011 10:2734-43 (PubMed).

If you have any interest in developing a diagnostic test that uses human urine, you should take a good close look at the data in this study. The investigators used the most up-to-date techniques (Orbitrap-Velos using HCD) and one important type of protein fractionation (lectin pull-down). The results give quite a clear picture of the major and minor proteins present in urine and its provides a nice map to the peptides and modifications that can be expected from this important class of clinical samples.

Data set of the week: (2011/06/06)
Proteomics analysis of the cardiac myofilament subproteome reveals dynamic alterations in phosphatase subunit distribution.
Overall rating: two stars - very good data (general interest)

This data set was made available as 156 individual LC/MS/MS runs, each representing an SDS-PAGE gel band. The data was published by Yin X, Cuello F, Mayr U, Hao Z, Hornshaw M, Ehler E, Avkiran M, and Mayr M. in Mol Cell Proteomics, 2010, 9:497-509 (PubMed).

This study provides some interesting insights into the protein composition of rat cardiac myocytes, both in control and treated cases. The data clearly supports the conclusions in the paper and it also provides many of the best observations of the cardiac muscle proteins associated with these cells. There has been significantly less attention to rat proteomics than to mouse or human, so quality data sets such as this one significantly improve what is known about this important model species.

Data set of the week: (2011/05/30)
Novel In Situ Collection of Tumor Interstitial Fluid from a Head and Neck Squamous Carcinoma Reveals a Unique Proteome with Diagnostic Potential.
Overall rating: one star - very good data (specialist interest)

This data set was composed from multiple LC/MS/MS run using multidimenstional chromatography into single summary result. The data was published by Stone MD, Odland RM, McGowan T, Onsongo G, Tang C, Rhodus NL, Jagtap P, Bandhakavi S, and Griffin TJ. in Clin Proteomics 2010 6:75-82 (PubMed).

These results give an excellent insight into the proteins that can be expected in interstitial fluid, a clinically important fluid that has not been studied extensively by proteomics methods. The composition of the fluid was most similar to blood plasma and plasma-derived fluids, e.g. saliva, urine or cerebrospinal fluid. Anyone planning to do an experiment involving interstitial fluid should examine these results carefully.

Data set of the week: (2011/05/24)
Proteomic analysis reveals a virtually complete set of proteins for translation and energy generation in elementary bodies of the amoeba symbiont Protochlamydia amoebophila.
Overall rating: one star - very good data (specialist interest)

This data was collected from a combination of multidimensional chromatography and SDS-PAGE bands, resulting in 232 individual data sets. The data was published by Sixt BS, Heinz C, Pichler P, Heinz E, Montanaro J, Op den Camp HJ, Ammerer G, Mechtler K, Wagner M, and Horn M. in Proteomics, 2011, 11:1868-92 (PubMed).

The results presented in this paper consistuted the first proteomics information available about an ameobiod obligate symbiont of the Acanthamoeba spp. These common amoeba are only rarely pathogenic, however studying their symbiont's metabolism may provide insight into the molecular basis of the eukaryote/prokaryote endosymbyotic relationships that seem to be very common in nature. The recent availability of the symbiont's genome made the use of proteomics techniques possible. The combination of methods used in this study were a little unusual, but they resulted in a good survey of the proteins in the organism, adding 1447 P. amoebophila proteins to GPMDB.

Data set of the week: (2011/05/15)
Multi-omics approach to study the growth efficiency and amino acid metabolism in Lactococcus lactis at various specific growth rates.
Overall rating: four stars - exceptional achievement (leading the field)

four stars - exceptional achievement (leading the field)

This data was collected from multidimensional chromatography, resulting in 64 LC MS/MS runs and experiment summaries. The data was published by Lahtvee PJ, Adamberg K, Arike L, Nahku R, Aller K, and Vilu R. in Microbial Cell Factories, 2011, 10:12. (PubMed).

This study was an outstanding example of the application of proteomics methods carefully and methodically to a problem in biotechnology. All of the aspects of the investigation — experimental design, sample preparation, chromatography and mass spectrometry — were well thought out and executed with a consistent attention to detail and quality. The experiments reported in the paper go well beyond simply performing proteomics experiments by the use of other 'omics approaches, significantly increasing the value of the proteomics results. The information generated by this study has greatly expanded general knowledge with regards to the proteome of Lactococcus lactis, one of the most important bacteria in the food processing industry. It also provides a good basis for understanding aspects of this organism's metabolism.

Data set of the week: (2011/05/08)
Large-scale label-free quantitative proteomics of the pea aphid-Buchnera symbiosis.
Overall rating: two stars - very good data (in general)

This data was collected from excised SDS-PAGE gel bands, resulting in 148 LC MS/MS runs. The data was published by Poliakov A, Russell CW, Ponnala L, Hoops HJ, Sun Q, Douglas AE, and van Wijk KJ in Mol Cell Proteomics, 2011 Mar 18 (PubMed).

These experiments explore the proteomics of the relationship between the pea aphid, Acyrthosiphon pisum, and its endosymbiont bacterium Buchnera aphidicola. Buchnera bacteria are obligate endosymbionts in aphids, having lost the metabolic pathways necessary to be free living organisms. The recent availability of the genomes of both the aphid and the bacterium makes it possible to do a thorough job of examining the proteins present from both genomes in the intact organism. The results clearly demonstrate that any investigation of insect proteomics should be very mindful of selecting an appropriate mixture of proteomes when analyzing raw data. This data set should also be revisited when the genomes of other secondary endosymbionts of the pea aphid become known, such as Hamiltonella defensa, Regiella insecticola, and Serratia symbiotica.

Data set of the week: (2011/05/01)
Large-scale Arabidopsis phosphoproteome profiling reveals novel chloroplast kinase substrates and phosphorylation networks.
Overall rating: three stars - excellent data (worth studying)

This data was collected and deposited as 13 LC MS/MS runs, using a metal oxide column strategy to enrich phosphopeptides. The data was published by Reiland S, Messerli G, Baerenfaller K, Gerrits B, Endler A, Grossmann J, Gruissem W, and Baginsky S. in Plant Physiol. 2009 150:889-903 (PubMed).

This study was a very successful application of the prefractionation techniques that have been developed to enrich phosphopeptides. The detailed examination of plant phosphoproteomics has been well behind fungal (yeast) and animal (human/mouse) studies, but this series of experiments shows conclusively that the same methods can be used to great effect. The data was of sufficient quality to allow the identification of more than 2,000 phosphopeptides per run. The identifications show the enrichment of acidic residues characteristic of metal oxide enrichment schemes.

Data set of the week: (2011/04/24)
Proteome of the Caenorhabditis elegans Oocyte.
Overall rating: two stars - very good data (in general)

This data was composed of 125 LC MS/MS runs, generated from SDS-PAGE bands. The data was published by Chik JK, Schriemer DC, Childs SJ, and McGhee JD in J Proteome Res. 2011 Apr 15 (PubMed).

The results of this study demonstrated the importance of examining specific tissues in an organism, even one with as few differentialed organ systems as C. elegans. Even though C. elegans is well represented in GPMDB (> 1,000,000 protein ids), this study contains many top ranking identifications for specific proteins, almost certainly because of the relatively high concentration of those proteins in the oocyte. The data itself was taken in a very consistent manner, with each gel band having good correlation between the detected gene product molecular masses. With 6,691 total protein ids, this rather modest study provides a very comprehensive view of the C. elegans oocyte proteome.

Data set of the week: (2011/04/17)
Identification of outer membrane proteins from an Antarctic bacterium Pseudomonas syringae Lz4W.
Overall rating: two stars - very good data (in general)

The data from this study was comprised of 14 LC MS/MS runs, generated from SDS-PAGE bands. The data was published by Jagannadham MV, Abou-Eladab EF, and Kulkarni HM in Mol Cell Proteomics. 2011 Mar 29 (PubMed).

This study demonstrates how to gain significant insights into prokaryotic cell organization using proteomics techniques, once you have a good genome sequence for a closely related species (or two). The species under study here was a plant pathogen — Pseudomonas syringae — that has the singular ability to elevate the freezing point of water. This paper focuses on a cryophilic strain of the bacteria in an attempt to understand how it can function effectively in a rather extreme environment. The authors do a good job of using a proteomics strategy to acquire useful information about the organism's biology.

Data set of the week: (2011/04/10)
Improved Peptide Identification by Targeted Fragmentation Using CID, HCD and ETD on an LTQ-Orbitrap Velos.
Overall rating: three stars - excellent data (worth studying)

The experiments in this study generated 73 LC MS/MS runs, using single- and multi-dimensional chromatographic peptide separations. The data was published by Frese CK, Altelaar AF, Hennrich ML, Nolting D, Zeller M, Griep-Raming J, Heck AJ, and Mohammed S in J Proteome Res. 2011 Apr 1 (PubMed).

These results were produced by a well thought-out study to determine the validity of various claims that have been made about the efficacy of the three most popular fragmentation modalities for MS/MS-based proteomics: CID, ETD and HCD. Each of these mechanisms was given a good workout and a fair, side-by-side comparison was made without apparent bias. If you are interested in selecting between one of these methods for an upcoming experiment, it would be well worth your while to look at this comparative study to assist you in making up your own mind.

Data set of the week: (2011/04/03)
A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry.
Overall rating: four stars - exceptional achievement (leading the field)

The experiments in this study generated 335 LC MS/MS runs, most representing individual SDS-PAGE gel bands. The data was made available in Tranche by Raghothama Chaerkady, Dhanashree S. Kelkar, Babylakshmi Muthusamy, Kumaran Kandasamy, Sutopa B. Dwivedi, Nandini Patankar, Min-Sik Kim1, Santosh Renuse, Sneha Pinto, Rakesh Sharma, Harsh Pawar, Ajeet Kumar Mohanty, Yi Yang, A.P. Dash, Robert M. MacCallum, Bernard Delanghe, Ashwani Kumar, Godfree Mlambo, Mobolaji Okulate, Nirbhay Kumar, and Akhilesh Pandey.

These experiments were a tour de force of how to study whole organism proteomics in insects. The organism was disected and important organ systems were studied in detail. Even though the A. gambiae genome has been available since 2002, this study was the first thorough examination of the distribution of proteins in this important mosquito (it is the insect vector of malaria). Technically, it uses cutting edge mass spectrometry-based identification methods. The measured fragment ion mass accuracy was < 5 ppm for most of the individual runs, allowing for high confidence peptide identifications (≤ 0.05% FPR).

Data set of the week: (2011/03/27)
Plasma from normal vs. tumour bearing Her2/Neu mice.

The experiments in this study generated 306 LC MS/MS runs. This data set was made available as 32 separate TRANCHE entries, credited to Regine M Schoenherr from Mandy Paulovich's laboratory at the Fred Hutchinson Cancer Research Center in Seattle. The data was published by Schoenherr, R. M., Kelly-Spratt, K. S., Lin, C., Whiteaker, J. R., et al. in Proteomics Clin. Appl. 2011, 5, 179-188 (Abstract).

Each of the individual experiments was derived from a set of 10 control and 10 tumour-bearing Her2/Neu mice. These mice have been a popular model system for cancer research because of their tendency to generate metastatic breast tumours. The results give a good profile of the proteins detectable in M. musculus plasma under normal control conditions using standard methods and the Thermo-Finnigan LTQ as the main detection platform. Each TRANCHE entry was entitled using a mnemonic, for example:
"MARS_Sample_Pool_2_Normal_mzXML".
This abbreviated form describes the protein handling (MARS depletion), the specific replicate (Sample_Pool_2) and the animal pool (Normal). The use of this type of mnemonic has become a wide-spread (but regretable) practice in the proteomics community for describing information deposited in repositories.

Data set of the week: (2011/03/20)
An integrated workflow for charting the human interaction proteome: insights into the PP2A system.

The experiments in this study generated 62 LC MS/MS runs. Each run was the result of an affinity purification experiment, with either baits or controls. The data was published by Glatter T, Wepf A, Aebersold R, and Gstaiger M. Mol Syst Biol. 2009;5:237. (PubMed).

These results clearly demonstrated the merits of using highly specific affinity purification experiments when trying to thoroughly study the proteins associated with a specific pathway or particle. The data was of good quality, although the ion source did not perform uniformly in the low-organic phase portion of the liquid chromatography runs. For example, contrast the retention time vs pI plot for GPM33000032760 (good) with GPM33000032731 (not as good). This commonly seen experimental artifact probably had little effect on the biological conclusions drawn from the results. However, if the same data was used to draw inferences about which peptides were appropriate candidates for quantitation methods, this ion source inconsistency would lead to a bias against early eluting peptides.

Data set of the week: (2011/03/13)
Primary tumor xenografts of human lung adeno and squamous cell carcinoma express distinct proteomic signatures.

The experiments in this study resulted in 30 MudPIT experiments. Each experiment was composed of four MudPIT fractions, along with a summary of the set of fractions, a total of five GPM files per sample. The data was published by Wei Y, Tong J, Taylor P, Strumpf D, Ignatchenko V, Pham NA, Yanagawa N, Liu G, Jurisica I, Shepherd FA, Tsao MS, Kislinger T, and Moran MF in J Proteome Res. 2011 10:161-74. (PubMed).

The results give the proteins present in each of 10 human tumours grafted into SCID mice, with three replicates per tumour. The analysis required the simultaneous use of both the mouse and human proteomes, resulting in protein lists composed of a mixture of the two types of proteins. The human proteins show the proteins that would normally be expected in human tumour tissue, as well as a normal compliment of mouse blood proteins. In addition to the blood proteins, there was also clear evidence for a set of murine extracellular matrix proteins. The presence of these proteins strongly suggest that the host was able to begin infiltrating the tumour with ECM, even without a normal immune response to the xenograft material.

Data set of the week: (2011/03/06)
The ubiquitin-proteasome system is a key component of the SUMO-2/3 cycle.

The experiments in this study resulted in 5 LC/MS/MS runs. The data was published by Schimmel J, Larsen KM, Matic I, van Hagen M, Cox J, Mann M, Andersen JS, and Vertegaal AC in Mol Cell Proteomics. 2008 7:2107-22 (PubMed).

The data in this study resulted from a series of pull-down experiments with SILAC quantitation using HeLa cells. The results contained an unusually large number of identifications for rare proteins, as well as an over-representation of identifications that rated in the top percentile of all id's for particular proteins. Analysis of the protein sequence motifs present showed that the RNA recognition motif, RNP-1 had been highly enriched by this particular pull-down strategy. The underlying peptide id's were top quality with a very low number of false positives in the reported sequences assignments.

Data set of the week: (2011/02/27)
Initial characterization of the human central proteome.

The experiments in this study resulted in 99 LC/MS/MS runs. The data was published by Burkard TR, Planyavsky M, Kaupe I, Breitwieser FP, Bürckstümmer T, Bennett KL, Superti-Furga G, and Colinge J. in BMC Syst Biol. 2011 5:17 (PubMed).

The purpose of this research was to compare the proteomes of six human cell lines and determine which candidate proteins were present in all six. This set of proteins they postulated to be a "central" proteome: those proteins required by all human cells. While this concept will be debated for some time, this study provides excellent insight into the proteins present in these 6 cell lines under controlled conditions. The data divided up by cell lines are as follows:

Data set of the week: (2011/02/20)
Nitrite-driven anaerobic methane oxidation by oxygenic bacteria.

This study contains two experiments. The data was imported from Peptidome and was published by Ettwig KF, Butler MK, Le Paslier D, Pelletier E, Mangenot S, Kuypers MM, Schreiber F, Dutilh BE, Zedelius J, de Beer D, Gloerich J, Wessels HJ, van Alen T, Luesken F, Wu ML, van de Pas-Schoonen KT, Op den Camp HJ, Janssen-Megens EM, Francoijs KJ, Stunnenberg H, Weissenbach J, Jetten MS, and Strous M in Nature. 2010 464:543-8 (PubMed).

The data was generated using lysed cells from an unusual anaerobic bacterium, referred to by NCBI as "NC10 bacterium 'Dutch sediment'". The sample itself was obtained from mud dug out of a ditch in Holland. Compared to the well-controlled studies done with lab strains of bacteria or cell lines, the researchers in this case dealt with generating identifiable proteins from real field samples. The genome of the dominant species (Candidatus Methylomirabilis oxyfera) was available and the data could be interpreted in light of an unusual feature of the organism's methane oxidation metabolism.

Data set of the week: (2011/02/13)
Identification of cell wall and cytoplasmic proteins of Aspergillus fumigatus.

This study contains one summary of LC/MS/MS runs. The data sets were obtained from a whole organism extract using a Thermo- Finnegan LTQ mass spectrometer. The results have not be published, but were made available through Peptidome, sample PSM1346.

Aspergillus fumigatus is a commonly occuring environmental saphrophytic fungus. It can become clinically important in individuals with suppressed immune systems. The MS/MS data was typical of LTQ-based analysis, but the results obtained from the data was a bit of a puzzle. The original analysis (in Peptidome) only reported identifications for 2,223 spectra, whereas a fairly straighforward analysis in our hands yielded approximately 20,000 identifications. While the parameters used in the Peptidome analysis were not optimized (particularly the parent ion mass tolerance and the list of variable modifications), repeated examination and re-analysis in our hands was unable to resolve this significant difference. The data annotation stored in GPMDB was performed twice: once with the CADRE protein sequences alone and again with CADRE + RefSeq sequences for the same fungus strain. Because the original MASCOT analysis was made available on Peptidome's FTP site, it was possible to annotate each spectrum in the GPMDB analysis with those results for comparison (these appear as comments on each of the spectrum display pages).

Data set of the week: (2011/02/06)
Leishmania Proteomics.

This study contains 118 LC/MS/MS runs. The data sets were a combination of gel band and multidimensional chromatography separations. The mass spectrometry appears to have been performed using HCD fragmentation with an Orbitrap-LTQ hybrid instrument. This data has not yet been published.

The results obtained from this data serve as a primer on what can be obtained from the proteomics analysis of Leishmania major, a trypanosomatid protozoan that causes leishmania. The data was generated from the two dominant life stages of the organism: the amastigote stage that is adopted in the mammalian host; and the promastigote stage, adopted in the insect vector. The combination of protein-level and peptide-level separation as well as the very high accuracy fragment ion mass measurements make for a very broad coverage of proteins and peptides. Anyone interested in the proteomics of L. major should study these results thoroughly before planning their own experiments.

Data set of the week: (2011/01/30)
The steady-state repertoire of human SCF Ubiquitin ligase complexes does not require ongoing Nedd8 conjugation.

Principal investigator, Dr. Ray Deshaies

This study contains 41 LC/MS/MS runs. This data was published in Lee JE, Sweredoski MJ, Graham RL, Kolawa NJ, Smith GT, Hess S, and Deshaies RJ., Mol Cell Proteomics. 2010 Dec 17 (PubMed).

These interesting experiments were performed to explore the details of the current model of how intracellular protein degradation is organized and regulated. The experiments used SILAC and non-SILAC quantitation methods and experimental techniques that did a good job of pulling out the relavent cellular machinery. The results contained the most detailed observations yet of some of the important proteins in the ubiquitin-mediated protein degradation pathway, such as CAND1, CUL1, and the COPS subunits.

Data set of the week: (2011/01/16)
Proteome analysis of S. mediterranea

This study contains 19 LC/MS/MS runs. This data has not been published, but was made available by Mastrobuoni, G, et al., through Tranche, along with a few experimental details.

This data was very high quality, using isoelectric focussing to separate peptides in a similar manner to the use of SCX in MudPit. The organism studied was Schmidtea mediterranae, which is a free-living planarian (flatworm) with an exceptional ability to self regenerate when injured. While there is a genome project underway for this organism, the proteome sequence has not been made available. As an alternative, RNA sequence information was used, based on the current version of Unigene. The results show how well data can be analyzed with assembled transcriptional sequences only, which may remain the best alternative for many species of zoological or botanical interest for some years to come.

Data set of the week: (2011/01/09)
Pseudomonas aeruginosa - Orbitrap - strain PA14, cell lysate.

Principal-investigator, Dr. Edward Marcotte

This study contains 6 LC/MS/MS runs, generated from HPLC experiments. This data has not been published, but was made available by Taejoon Kwon, et al on the Marcotte Lab web site's data section, under the heading Data_12 (see the experimental description link for details).

This study provides a good view of the proteome of an important pathogen, Pseudomonas aeruginosa. P. aeruginosa is a common free-living bacteria that can rapidly colonize human tissue if it has been damaged or if there is a defect in the immune system. The results represent two biological replicates of cultured cells and provides a good starting point for any study of proteins produced by this organism.

Data set of the week: (2011/01/02)
The leukocyte nuclear envelope proteome varies with cell activation and contains novel transmembrane proteins that affect genome architecture.

Principal-investigator, Dr. Eric Schirmer

This study contains 8 summary results, generated from multidimensional chromatography experiments. The manuscript describing this work was published by Korfali N, Wilkie GS, Swanson SK, Srsen V, Batrakou DG, Fairley EA, Malik P, Zuleger N, Goncharevich A, de Las Heras J, Kelly DA, Kerr AR, Florens L, and Schirmer EC, Mol Cell Proteomics 2010 Dec;9:2571-85 (PubMed).

The results of this study provide a good survey contrasting the proteins present in R. norvegicus and H. sapiens microsomes. The GO displays for the individual experiments demonstrate the quality of the preparation methods used, showing very significant enrichment of endoplasmic reticulum, Golgi aparatus, integral membrane, mitochondrion and other membrane associated subcellular structures.