The Global Proteome Machine Organization
   News Archive

Data set of the week: (2009/12/28)
Community proteogenomics reveals insights into the physiology of phyllosphere bacteria

This dataset was transfered to GPMDB via ProteoExchange from PRIDE (see data). It is credited to Delmotte N, et al. and it is described in Proc Natl Acad Sci U S A. 2009 Sep 22;106(38):16428-33.

Data-set-of-the-week is a new feature for GPMDB, started with the intent of highlighting high quality data sets that have been made available via GPMDB and ProteomExchange. Data sets will be selected by a panel, but any suggestions (email to dsotw@thegpm.org) of suitable data will be considered.

The 1,000 most observed human proteins (2009/11/06)

This spreadsheet (human_top_1000.xls) is a list of protein sequences that have been observed most often by GPM users who used the "human" GPM search server. The columns in the spreadsheet are as follows:

  1. Column A: ENSEMBL protein accession number for the sequences;
  2. Column B: HUGO Gene Naming Committee symbol for the associated gene;
  3. Column C: NCBI gene number for the associated gene;
  4. Column D: International Protein Index accession number for the sequence;
  5. Column E: SwissProt/Uniprot accession for the sequence;
  6. Column F: the probability that a protein will be found in a dataset (as a percentage);
  7. Column G: the base-10 log of the minimum expectation value found for that protein; and
  8. Column H: a text description of the protein.

The value in Column F was calculated by taking the number of times (ni) that the protein was observed in the approximately 24,000 (N) datasets examined and doing the simple calculation:

pi = 100(ni/N)

A "dataset" corresponds to a submitted set of MS/MS spectra, which results in a GPM result file, so it is roughly equivalent to the set of data from an LC/MS/MS run. A protein can only be observed once in a dataset.

110,000,000th Peptide Id Recorded (2009/11/02)

Over the weekend, GPMDB passed the 110 million mark for peptide identifications. We would like to thank all of the data contributors who have made this project a success. Special thanks goes to our ProteomExchange partners TRANCHE and PRIDE, for making their data available.

Service interuptions for maintenance (2009/08/14)

We are performing some long delayed maintenance on the computers in the GPM system. There may be some service interuptions throughout the system starting Friday, August 14, lasting until Monday, August 17.

New data views: Protein-Protein Interactions and Groups (2009/07/20)

Two new views of GPM data sets have been added. The "ppi" view (available for ENSEMBL human and yeast accesssion numbers) is similar to the existing pathways display. The new display categorizes all of the proteins found in a dataset with the proteins in corresponding protein-protein interaction sets listed in BioGrid (human and yeast) and HPRD (human only).

The "group" view is simply a list of all proteins found as groups, with the primary member of a group being the protein displayed in the main model view, along with all of the proteins that could be obtained by using the individual homology lists. Any protein that has at least one spectrum assigned to a peptide sequence that is unique for that protein is listed as a primary protein.

Additionally, a link has been added that generates an MGF formatted annotated peptide spectrum library from the results of a single data set. This MGF file is formatted so that it can be used as a library for standalone instances of X! Hunter.

Please pickup your data (2009/05/12)

Last week, a user was very diligently attempting to get search results for two data sets, from the data files RAT592_C-all_20090430_LT.mgf and RAT592_B-all_20090430_LT.mgf. Unfortuately, these data files required more memory than was available on the search servers, so they failed to execute. We have re-run the data and the results are now available as GPM77700007010 and GPM77700007011 respectively. These data files both contain over 200,000 original spectra and appear to be composed of spectra merged together from LC/MS runs on individual gel band slices, using an ion trap instrument, from a sample that appears to be rat inner ear tissue.

New release of X! Hunter ASLs (2009/05/04)

The May 1st, 2009 release of the Annotated Spectrum Libraries — used by X! Hunter for high speed, high accuracy protein identification — is now available here. This new release contains libraries for commonly used eukaryote species as well as three SILAC libraries and libraries for five strains of E. coli.

Addition of SILAC Annotated Spectrum Libraries for X! Hunter (2009/04/02)

A new curation of the X! Hunter libraries now has a separate library file for annotated spectra that are assigned to the heavy isotope labels in SILAC experiments. The new libraries have been made for human, mouse and yeast peptides and they are available for download from the GPM ftp site eukaryote libraries collection. The SILAC libraries are named human_silac_20.hlf, mouse_silac_20.hlf and yeast_silac_20.hlf.

These libraries are also mounted on the public X! Hunter search site. To search a SILAC data set to extract both heavy and light peptides, select both the normal and SILAC libraries, as illustrated below. To extract only the SILAC (or normal) data, use only the appropriate one of these selections. In addition to the SILAC libraries, a major new release of the yeast library is also available.

Changes to Gene Ontology (GO) display (2009/04/22)

The display that indicates Gene Ontology classifications for the proteins in a data set has been updated to include more GO categories. The original display used 25 GO categories, made up of a selection of cellular components and cellular processes. This display has been updated to use 105 categories, with individual displays for each of cellular components, cellular processes and molecular functions (35 categories each). Once the GO page has been displayed (cellular components is the default), the other displays can be accessed using a new set of links, just below the histogram at the top of the page:

The new categories were selected based on the current population of GPMDB. Some of the GO descriptions have been altered slightly, to improve legibility in the alphabetical order used in the displays. The current list of categories that can be accessed with the new system are as follows:

cellular componentscellular processesmolecular function
  1. cell surface
  2. centrosome
  3. chromatin
  4. chromosome
  5. cytoplasm
  6. cytoskeleton
  7. cytoskeleton, actin
  8. endoplasmic reticulum
  9. endosome
  10. extracellular region
  11. extracellular matrix
  12. focal adhesion
  13. Golgi apparatus
  14. intermediate filament
  15. lysosome
  16. membrane
  17. membrane, anchored
  18. membrane, integral
  19. membrane, plasma
  20. membrane. nuclear
  21. microsome
  22. microtubule
  23. mitochondrion
  24. myosin complex
  25. nuclear pore
  26. nucleolus
  27. nucleus
  28. peroxisome
  29. proteasome
  30. ribonucleoprotein complex
  31. ribosome
  32. spliceosome
  33. tight junction
  34. transcription factor complex
  35. ubiquitin ligase complex
  1. apoptosis
  2. carbohydrate metabolism
  3. cell adhesion
  4. cell cycle
  5. cell differentiation
  6. cell-cell signaling
  7. cell proliferation, +ve regulation
  8. cell proliferation, -ve regulation
  9. chromatin modification
  10. dephosphorylation
  11. DNA repair
  12. DNA replication
  13. immune response
  14. inflammatory response
  15. integrin-mediated signaling
  16. lipid metabolic process
  17. meiosis
  18. metabolic process
  19. microtubule-based movement
  20. mitosis
  21. multicellular development
  22. protein dephosphorylation
  23. protein glycosylation
  24. protein phosphorylation
  25. protein folding
  26. proteolysis
  27. RNA splicing
  28. signal transduction
  29. signaling, G-protein
  30. transcription
  31. transcription, regulation
  32. translation
  33. transport
  34. transport, ion
  35. transport, protein
  1. acyltransferase activity
  2. binding, ATP
  3. binding, calcium ion
  4. binding, DNA
  5. binding, GTP
  6. binding, iron ion
  7. binding, magnesium ion
  8. binding, manganese ion
  9. binding, potassium ion
  10. binding, protein
  11. binding, RNA
  12. binding, sugar
  13. binding, zinc ion
  14. catalytic activity
  15. cytokine activity
  16. electron carrier activity
  17. G-protein coupled receptor activity
  18. hormone activity
  19. hydrolase activity
  20. ion channel activity
  21. kinase activity
  22. ligand-dependent nuclear receptor activity
  23. ligase activity
  24. lyase activity
  25. methyltransferase activity
  26. monooxygenase activity
  27. oxidoreductase activity
  28. peptidase activity
  29. phosphatase activity
  30. protein S/T kinase activity
  31. protein Y kinase activity
  32. receptor activity
  33. signal transducer activity
  34. transporter activity
  35. ubiquitin-protein ligase activity

Version 4 of NCTA information released (2009/04/08)

The Version 4 curation of the Normal Clinical Tissue Alliance data has been released. This new data set contains for the first time information on the proteins found in human embryonic stem cells.

Addition of SILAC Annotated Spectrum Libraries for X! Hunter (2009/04/02)

A new curation of the X! Hunter libraries now has a separate library file for annotated spectra that are assigned to the heavy isotope labels in SILAC experiments. The new libraries have been made for human, mouse and yeast peptides and they are available for download from the GPM ftp site eukaryote libraries collection. The SILAC libraries are named human_silac_20.hlf, mouse_silac_20.hlf and yeast_silac_20.hlf.

These libraries are also mounted on the public X! Hunter search site. To search a SILAC data set to extract both heavy and light peptides, select both the normal and SILAC libraries, as illustrated below. To extract only the SILAC (or normal) data, use only the appropriate one of these selections. In addition to the SILAC libraries, a major new release of the yeast library is also available.

New Gene Ontology pages added to GPMDB (2009/04/02)

GPMDB has had a limited set of Gene Ontology (GO) pages available, that contain lists of observed proteins in the human, mouse or yeast proteomes that belong to particular GO classifications. The original index has been maintained, but a large selection of new categories has been added. These new new pages can be reached by clicking on the all human, all mouse, or all yeast links. These indexes display all of the available GO classifications, broken up into biological process, cellular component and molecular function sub-categories. The human and mouse pages use the full set of GO categories (from ENSEMBL), while the yeast page uses GO-slim (from SGD).

Amino acid analysis (AAA) display added (2009/03/24)

A new display that calculates eight amino acid analyses for a particular data set has been made available in both GPMDB and all of the public GPM search servers. The results of the analysis is displayed in a table, giving the amino acid composition of the following sets of residues found in a search model:

  1. Pre: AAA of the residue in the protein sequence immediately prior to the N-terminus of each unique peptide;
  2. N-terminal: AAA of the N-terminal residue of each unique peptide;
  3. C-terminal: AAA of the C-terminal residue of each unique peptide;
  4. Post: AAA of the residue in the protein sequence immediately following the C-terminus of each unique peptide;
  5. All: AAA of all peptides identified (including multiple identifications of the same peptide sequence);
  6. Protein: AAA of all proteins identified;
  7. Unique: AAA of the unique peptides identified; and
  8. Delta: difference between the unique peptide AAA and the protein AAA.

The display can be accessed through the "aaa" link on the peptide display tool bar (click here for an example).

New documentation resource for GPMDB (2009/03/06)

In order to improve the documentation for GPM, we have started a project on our wiki called Technical Overview. Dan Evans will be adding new information and updating the writeups for the GPM utilities and GPMDB table structure.

ENSEMBL sequences updated (2009/01/30)

Protein sequences on the GPM search sites that use ENSEMBL accession numbers have been updated to ENSEMBL version 52. The associated sequence annotations have been updated to UNIPROT version 14.7.

70,000,000th Peptide Id Recorded (2009/01/14)

Today, GPMDB passed the 70 million mark for peptide identifications. We would like to thank all of the data contributors who have made this project a success. Special thanks goes to our ProteomExchange partners TRANCHE and PRIDE, for making their data available.

GPMDB Phosphopeptide Collection (pSYT) (2009/01/14)

GPMDB has a large number of phophopeptide observations available for use. We have added a new user interface, called pSYT, to allow users direct access to this information on a protein by protein basis. To access pSYT for human, mouse, yeast and zebrafish proteins, use the corresponding link on the protein toolbar at the top of any protein display page:

The current statistics for phosphopeptides in GPMDB are as follows:

Species

Observations

Unique peptides

Observations/peptide

H. sapiens

382,884

17,876

21.4 ×

M. musculus

61,003

9,125

6.7 ×

S. cerevisiae

20,043

4536

4.4 ×

D. rerio

10,588

959

11.0 ×

Copyright © 2009, The Global Proteome Machine Organization