The Global Proteome Machine Organization
   GPM Blog
Data set of the week: (2014/4/14)
Comparison of two phenotypically distinct lattice corneal dystrophies caused by mutations in the transforming growth factor beta induced (TGFBI) gene.
Overall rating: very good data (specialist interest)
This data set consisted of 10 results, obtained from LC/MS/MS analyses of tissue samples. The data files were made available through ProteomeXchange, PXD000307. It has been published by Poulsen ET1, Runager K, Risør MW, Dyrlund TF, Scavenius C, Karring H, Praetorius J, Vorum H, Otzen DE, Klintworth GK and Enghild JJ, Proteomics Clin Appl. 2013 Dec 2 (PubMed).
This data provides some of the best insight to-date on the major components of human corneal tissue. The tissue sampling and experimental workflow produced very good reproducibility in the lists of detected peptides and proteins. Any group interested the the major proteins present in the cornea or their post-translational modifications should study this data in depth prior to performing their own experiments.
April 2014 Editions of the Mouse and Human Proteome Guides (2014/4/2)
The latest editions of the Guide to the Human Proteome and the Guide to the Mouse Proteome have been released and are available for download and use. They are both available in either HTML, CSV (comma-separated value) or XLS (excel spreadsheet) formats.
The following chart shows the status of Homo sapiens protein-coding splice variants in the current Guide to the Human Proteome:
The histogram bars are stacked plots of the fraction of protein-coding splice variants observed on each chromosome and the colors represent the splice variant sequences classified by evidence code. Black (EC 1) indicates the fraction of splice variants for which there have been no peptides observed for a splice variant sequence with an E-value ≤ 0.01. Red (EC 2) is the fraction of variants with at least one peptide observed with an E-value ≤ 0.01. Yellow (EC 3) is the fraction with at least one peptide that has been observed multiple times and those observations pass one of two tests for deterministic behavior. Green (EC 4) is the fraction where at least one peptide has been observed multiple times and passes both tests for deterministic behavior.
The same plot from Guide to the Mouse Proteome shows that while the overall number of splice variant assignments for mouse are lower, the same general trends are present:
Data set of the week: (2014/4/2)
In vivo SILAC-based proteomics reveals phosphoproteome changes during mouse skin carcinogenesis.
Overall rating: very good data (specialist interest)
This data set consisted of 315 results, obtained from SDS-PAGE gel bands, metal-oxide affinity fractionation and multi-dimensional LC/MS/MS analyses using SILAC quantitation. The data files were made available through ProteomeXchange, PXD000821. It has been published by Zanivan S, Meves A, Behrendt K, Schoof EM, Neilson LJ, Cox J, Tang HR, Kalna G, van Ree JH, van Deursen JM, Trempus CS, Machesky LM, Linding R, Wickström SA, Fässler R, and Mann M, Cell Rep. 2013 Feb 21;3(2):552-66 (PubMed).
The data associated with this study provides some of the best evidence about the proteins present in Mus musculus skin tissue. Skin is an under-studied tissue in proteomics, even though it is abundant, relatively easy to sample and clinically important. The data from this study showed good reproducibility and attention to detail in both the sample preparation and chromatography. The analysis in the manuscript was significantly flawed because of a failure to consider the modifications present in collagen (the most abundant protein in skin), but that does not take away from the value of the data itself as a good example of what can be observed from skin tissue.
Data set of the week: (2014/3/23)
Coordinated activation of PTA-ACS and TCA cycles strongly reduces overflow metabolism of acetate in Escherichia coli.
Overall rating: excellent data (leading the field)
This data set consisted of 10 results, obtained from LC/MS/MS analysis. The data files were made available through ProteomeXchange, PXD000556. It has been published by Peebo K, Valgepea K, Nahku R, Riis G, Oun M, Adamberg K and Vilu R, Appl Microbiol Biotechnol. 2014 Mar 15 (PubMed).
The proteomics group at the Competence Center of Food and Fermentation Technologies at the Tallinn University of Technology has been one of the top performers in terms of data quality for several years and they do not disappoint with this data set. This group has developed into one of the few labs that can genuinely produce results demonstrating high run-to-run reproducibility in the analysis of complex samples. This set of MS/MS analyses would be an excellent choice for any group interested in the practical limits associated with replicate analysis in proteomics.
Data set of the week: (2014/3/13)
Functional analysis of novel Rab GTPases identified in the proteome of purified Legionella-containing vacuoles from macrophages.
Overall rating: excellent data (leading the field)
This data set consisted of 120 results, obtained from LC/MS/MS analysis of excised SDS-PAGE gel bands. The data files were made available through ProteomeXchange, PXD000647. It has been published by Hoffmann C1, Finsel I, Otto A, Pfaffinger G, Rothmeier E, Hecker M, Becher D and Hilbi H, Cell Microbiol. 2013 Dec 26 (PubMed).
This well planned and executed study examines the host effects associated with the opportunistic pathogen Legionella pneumophila, which causes the life-threatening pneumonia commonly referred to as Legionnaires' disease. These experiments focus on understanding the "Legionella-containing vacuole", a structure formed by the organism in the host cell that is used to facilitate replication. By isolating these vacuoles in two very different eukaryotic systems (Mus musculus and the amoeboid form of Dictyostelium discoideum), the study was able to demonstrate what systems the pathogen is using to form and maintain this structure. The proteomics data is of excellent quality and would be ideal to use as a case study in multi-species data analysis and biological interpretation.
Data set of the week: (2014/3/5)
Thirty-thousand-year-old distant relative of giant icosahedral DNA viruses with a pandoravirus morphology.
Overall rating: very good data (specialist interest)
This data set consisted of 1 result, obtained from a single LC/MS/MS analysis. The data files were made available through ProteomeXchange, PXD000460. It has been published by Legendre M, Bartoli J, Shmakova L, Jeudy S, Labadie K, Adrait A, Lescot M, Poirot O, Bertaux L, Bruley C, Couté Y, Rivkina E, Abergel C and Claverie JM, Proc Natl Acad Sci U S A. 2014 Mar 3 (PubMed).
This study involves the characterization of the giant virus Pithovirus sibericum, a 1.5 micron long amphora-shaped viron. The virus was isolated from a 30,000 year old sediment and grown in Acanthamoeba castellanii. The virus has a 600 kilobase genome, with open reading frames for approximately 2,500 proteins. The proteomics study made available was obtained from isolated viron particles, with 70 identified proteins from the host A. castellanii and 193 proteins from P. sibericum. While the peptides showed a considerable amount of non-tryptic cleavage, the mass spectrometry and chromatography were both very well done.
Changing the naming convention for amino acid polymorphisms (2014/2/4)
GPM and GPMDB have been acquiring information about amino acid polymorphisms since the system began operating in 2004. The process accelerated sigificantly with the introduction of dbSNP annotation information in 2006. Nucleotide polymorphism research has advanced tremendously during this period, to the point that the original term "polymorphism" no longer accurately describes the phenomena being studied. The term SNP has been largely replaced by SNV (Single Nucleotide Variant) to reflect the changes in the field. To keep up with these changes, GPM and GPMDB will be altering all references to SNPs and SNAPs (Single Nucleotide-induced Amino acid Polymorphisms) to SNVs and SAVs (Single Amino acid Variants). The name of the server used to provide the interface for GPMDB's collected SAV information with remain stable at, but the alias will be added for forward compatibility.
System updates for GPMDB's 10th anniversary (2014/2/4)
GPMDB had its tenth anniversary of operation on Jan. 1, 2014: the public interface was first made available on Jan. 1, 2004. The overall success of the project has made it necessary to invest in updating the hardware and software resource that run GPMDB on a daily basis. Today marks the end of this upgrade cycle, with the successful completion of adding a new, faster server for processing incoming data files into database entries. The following items have been added/upgrading during the process:
  1. a new server has been added, dedicated to processing REST information requests (;
  2. 30 TB of disk storage has been added to the system, allowing for significantly greater data volume and backup capabilities;
  3. new solid-state drives have been added to the publicly available system, to increase capacity, speed up queries and reduce cost;
  4. the data file processing server has replaced, with a tested capacity of > 6 billion new identifications per year;
  5. 30 GB of memory has been added to the pool available for user queries; and
  6. all software platforms (e.g., PERL, MySQL) have been updated to the latest stable versions available.
Copyright © 2013, The Global Proteome Machine Organization. Privacy Statement