The Global Proteome Machine Organization   The Global Proteome Machine Organization
    www.thegpm.org

News Archive

The GPM wiki site opening (2007/11/07)

As an experiment in how to most effectively annotate proteomics data, the GPM now has a dedicated wiki system integrated into its user interface. This wiki can also be accessed through wiki.thegpm.org. Currently, the GPM interface is linked to the wiki on the level of GPM accession number, protein accession number and individual peptide sequences.

Sequence updates (2007/11/03)

The sequences used for human and mouse have been updated to ENSEMBL v.47. Mouse now uses NCBI m37, the mose recent version of the mouse genome. The single nucleotide induced amino acid polymorphisms listings have also been updated to reflect the changes in these new sequence collections. The sequences used for rice have also been updated, to use OSA1r5 from the J. Craig Ventner Institute (JCVI, the new name for TIGR). Changes in the way that JCVI refers to sequences has led to a change in the style of accession numbers being used for rice: rather than the feature index, the locus accession is now being used.

Temporary service interuption (2007/10/26)

Between approximately 11:00 to 13:00 PDT on Oct. 26th, a number of the GPM search servers will be unavailable. This interuption is necessary to perform some much needed systems maintenance.

The "Global" in Global Proteome Machine (2007/10/17)

In order to better understand how the GPM system is being used, we have begun to use Google Analytics to generate statistics on the use of GPM generated searches and GPMDB database information retrieval. We will make this information available on a monthly basis. The first month's data is available as a PDF file. The current report shows what locations in the world are using GPM and approximately how many pages are being downloaded per user visit.

New libraries for X! Hunter (2007/10/11)

The annotated spectrum libraries for X! Hunter have been updated, with a significant expansion of sequence coverage for most species (see the new statistics here). Libraries for P. troglodytes (chimp) and Felis catus (house cat) have been added to the eukaryote species collections.

New species added to X! Hunter (2007/09/02)

The X! Hunter Annotated Spectrum Libraries have been updated to include a number of prokaryote species, based on new data submitted to GPMDB. The following species are now available for high-speed searching:

  1. Deinococcus radiodurans
  2. Escherichia coli
  3. Halobacterium sp.
  4. Mycobacterium smegmatis
  5. Mycobacterium tuberculosis
  6. Salmonella enterica
  7. Salmonella typhi
  8. Salmonella typhimurium
  9. Shewanella oneidensis
  10. Streptococcus pyogenes

Opening of the GPMDB MS/MS repository (2007/09/01)

The GPM Database has become the largest source of publically accessible data through the donation of data from laboratories from around the world. In an effort to make that service more comprehensive, we have added a new feature to the public GPM sites that can create a highly compressed version of all of the original MS/MS data files submitted for analysis. If the results will be made available in GPMDB, the compressed MS/MS data file will now be archived and made available using the CMN 1.0 data format. The total contents of the archive will available at ftp://ftp.thegpm.org/data/msms.

These files will be named in the same manner as GPM data models, for example the data model accession number "GPM00300001111" will have model file named "GPM00300001111.xml" and an archived data file named "GPM00300001111.cmn". This archive is organized into separate folders, corresponding to the first three numbers in the GPM accession number.

New release of X! series search engines (2007/07/01)

The X! series search engines (Tandem, P3 and Hunter) have been updated to include compatibility with some variants of mzXML and mzData spectrum input files, which use 64-bit floating point numbers for fragment ion mass and intensity information. X! Hunter has also been updated to include a new format (see the definition) for the input annotated spectrum libraries that is a suggested standard format for the exchange of this type of information.

GPM Adopts Cell and Tissue Ontologies (2007/06/28)

In an effort to increase the utility of the GPM and GPMDB, the public sites have been updated to include an interface allowing researchers to include more information about their experiments. This information is organized around current "ontology" projects, which supply standard lists of relevant biological terms linked to accession numbers. The ontologies were chosen to provide as much consistency as possible between GPMDB and PRIDE.

  1. Gene Ontology (GO): the GO Slim list of terms associated with cellular localization;
  2. Cell Type Ontology (CELL): a fairly comprehensive collection of eukaryote cell types; and
  3. BRENDA Tissue Ontology (BRENDA): the BRENDA tissue list has been broken down into cell lines and tissues normally found in an organism.

New X! Hunter ASLs released (2007/05/20)

The 2007.05.15 version of the GPM Annotated Spectrum Libraries for X! Hunter are now available for download from the GPM FTP site. The new library was compiled using a new curation process that was designed to reduce the number of potential false positive entries in the library. The list of allowed sequence modifications was expanded to include

  1. ICAT (both classic and cleavable);
  2. ITRAC;
  3. S/T/Y phosphorylation; and
  4. Q/N deamidation

The new libraries also include HLF X! Hunter files, MGF spectrum files and FASTA peptide files for use in bioinformatics research.

Milestone reached (2007/05/09)

GPMDB added its 25,000,000th peptide identification over the weekend. We would like to thank all of the individual data contributors, as well as the team at the PeptideAtlas repository, for making this possible.

System outage (2007/04/30)

GPMDB will be unavailable for several hours on the afternoon of April 30, 2007 for system maintenance.

System updates (2007/04/15)

A number of updates/upgrades have been performed on the overall GPM system.

  1. The human, mouse and rat proteomes have been updated to the latest version from ENSEMBL (v. 43)
  2. The 2007.04.01 versions of X! Tandem and P3 have been deployed. This release adds the capability of checking for known single amino acid polymorphisms (SAPs). The known annotations are based on the dbSNP and ENSEMBL SNP databases for coding, non-synonymous SNPs. The annotation files are available from the GPM FTP site. This capability has been made the default behavior for searching human, mouse and rat ENSEMBL proteomes.
  3. The frog and fish boutique sites have been moved to 8 core computer platforms.

New equipment for boutique proteomes (2007/03/10)

The servers being used for the cow, mouse, rat, plant, and prokaryote boutique sequence sites have been upgraded to the same type of dual quad-core processor based computers as the new human site. The new servers are a generous gift of the Biomedical Research Centre at the University of British Columbia. We'd like to thank John Wilkins group at the University of Manitoba, who donated the equipment to host these sites for the last two years.

Human Invitational proteome updated (2007/03/08)

The Human Invitational Database is a collection of highly curated RNA sequences meant to track the existence of splice variants and unanticipated translations of human genes. We have always made this sequence collection available through the human boutique search server and have updated this set of sequences to version H-InvDB_3.8.

Cat and Guinea pig proteomes added (2007/03/08)

The predicted sequences of the cat (Felis catus) and Guinea pig (Cavia porcellus) proteomes have been added to the main servers of the GPM. These sequences were obtained from the ENSEMBL CAT build 43.1 and ENSEMBL cavPor2 build 43.1. These are low coverage 2X assemblies, so the underlying gene models are expected to change with time. For comparison, these proteomes contain approximately 13,000 more protein sequences for each species than are available in NCBI's nr.

Rabbit proteome added (2007/03/07)

The predicted sequences of the rabbit (Oryctolagus cuniculus) proteome have been added to the main servers of the GPM. These sequences were obtained from the ENSEMBL RABBIT build 43.1b. This is a low coverage 2X assembly, so the underlying gene models are expected to change with time. For comparison, this proteome contains approximately 10,000 more rabbit protein sequences than are available in NCBI's nr.

Equipment upgrade (2007/02/28)

On Saturday (2007/02/24) we upgraded the search servers human, h066, and h112 to dual processor (Intel XEON E5345, 2.2 GHz), quad-core computers, improving search speed performance to be about three times faster than the fastest other computers we have in the system. Several more of these relatively high speed computers have been ordered and they should be installed within a few weeks.

An error in one of the configuration files on the new "human" server has caused any searches performed on that server using the IPI, SWISS-PROT, UNIGENE or HIT sequences sets to be incomplete: X! Tandem was unable to access the appropriate sequence files. The problem has been corrected, but any searches performed on these sequence sets since Saturday should be repeated. The same problem affected all searches performed using X! Hunter (the "Feeling lucky" button).

New versions of X! series search engines released (2007/01/31)

New versions of X! Tandem, P3 and Hunter are now available at the GPM ftp site. This release fixes up a few small issues associated with operating system compatibility, some new information generated from the data in GPMDB and adds some new information to the output data files that can be used for quality control purposes. It also includes compiled versions for the Mac OS 10.4 for Intel-based Macs.

Prokaryote sequences updated (2006/12/08)

The prokaryote search site has been updated from the NCBI site, including updated sequences for many common prokaryotes, as well as some new species, such as Mycobacterium smegmatus. These sequences have been added to those available at the GPM prokaryote sequence ftp site.

New site for ABRF PRG search (2006/11/30)

At the request of Brett Phinney, we have set up a site specifically for use with ABRF Proteomics Research Group (PRG). This site uses the protein sequences involved in the PRG 2007 study (prg.fasta.gz). You can find the site at http://prg.thegpm.org.

Weather problems (2006/11/27)

Because of a severe winter storm, the GPM servers located in Vancouver BC were unavailable for most of the day, November 27, 2006. This was caused by a general power failure at the University of British Columbia that shut down the University's computer network.

Addition of the Universal Protein Standard to cRAP (2006/11/19)

The common Respository of Adventious Proteins (cRAP) has been updated to include the UniProt sequences corresponding to the Sigma-Aldrich Universal Protein Standard UPS1 set of proteins. The accession numbers and identity of the proteins are listed on the cRAP project page. All of the GPM public identification servers have been updated to reflect these changes. The main cRAP FASTA file as well as a separate file containing only the UPS1 sequences can be obtained from the GPM FTP site.

Updates for X! Series search engines (2006/10/25)

Minor updates to the 2006.09.15 release of the X! Series search engines are now available. These changes improve X! Tandem and P3 compatibility with ECD/ETD spectra. We'd like to thank Brett Phinney for supplying the necessary experimental data that has allowed us to improve the results for these spectra. New versions of peptide.pl and peptide_studio.pl are available that properly markup the c and z ions generated by these ion sources.

System maintenance service interuptions (2006/10/16)

Because of planned maintenance in one of our data centres, some servers may be difficult to reach on the afternoon of Oct. 16 and the morning of Oct. 17. The affected servers are:

  1. human.thegpm.org;
  2. mouse.thegpm.org;
  3. protista.thegpm.org;
  4. h451.thegpm.org;
  5. ppp.thegpm.org;
  6. xhunter.thegpm.org; and
  7. rat.thegpm.org

Updates for X! Series search engines (2006/9/17)

New versions of the X! Series search engines have been released, all with the version number 2006.09.15. These changes have been consolidated into a new release of the GPM-XE system.

  1. X! Tandem and X! P3. No changes have been made to the functionality of the search engines. The changes made to these projects are to improve the cross-platform compatibility of the projects and to conform to the latest security updates from Microsoft for the Windows versions.
  2. X! Hunter. Changes have been made to the scoring model, incorporating information about the original assignment confidence of a particular library spectrum to its associated peptide. Several other changes have been made to improve memory usage and overall execution speed.
  3. X! Hunter ASL creation/curation system. We have released the full system that we use internally to create the Annotated Spectrum Libraries used by X! Hunter. This system can be used to generate a custom ASL library from any GPMDB installation. Please refer to the installation documentation for any site-specific requirements for this release.
  4. X! Hunter ASL file format. The format for the ASL library files has been updated to add the information necessary for the change in the scoring model. The new format is defined here.

A new role for AP2-gamma (2006/9/11)

Harry and Mary Lynn Duckworth and their collaborators at the University of Manitoba used the GPM to provide the first direct evidence for the involvement of a placental-specific transcription factor in the regulation of a member of this gene family. They reported the work in Endocrinology 2006, 147, 4319. (Full Text).

System outage (2006/9/11)

Because of a hacker breakin, the servers h451.thegpm.org, mouse.thegpm.org, rat.thegpm.org and xhunter.thegpm.org will be out of service for most of today as we reinstall software and make the necessary security adjustments.

PepSeeker links added (2006/7/19)

A multidisciplinary group at the University of Manchester, lead by Simon J. Gaskell and Simon J. Hubbard, have developed a new proteomics database called PepSeeker. This database was designed mainly to aid in the development of new theoretical understanding of gas phase peptide chemistry (see Abstract). GPM spectrum display pages now have links that allow the user to see the evidence in Pepseeker for the peptide being displayed.

X! Hunter annotated spectrum library paper published (2006/7/18)

A manuscript describing X! Hunter Annotated Spectrum Library (ASL) searches has been published on the ASAP section of the Journal of Proteome Research (Abstract). This paper describes the details of how the ASLs are compiled from the data in the GPMDB and explains the architecture of the underlying informatics. An example contrasting ASL and conventional protein identification demonstrates some of the unique features of this new type of proteomics technique.

Addition of Neurospora crassa (2006/7/18)

The model organism GPM search sites have been updated to include the bread mould Neurospora crassa OR74A. The sequences correspond to the Entrez May 2006 version of the genome. The sequence files (n_crassa.fasta and n_crassa.fasta.pro) are available from the GPM ftp site.

Change in honey bee sequences (2006/7/10)

ENSEMBL has dropped honey bee from its list of supported species. Since the GPM search sites use honey bee ENSEMBL accessions, we had switched to the NCBI version of the honey bee genome. The new honey bee sequence files (bee_e.fasta.pro.gz and bee_e.fasta.gz) are available from our FTP site. All of our search sites have been updated to use the new sequence set.

Peroxisomes role in yeast lipid metabolism (2006/7/5)

Joel M. Goodman and his collaborators at the University of Texas Southwestern Medical School and University of North Texas used X! Tandem to demonstrate the previously unappreciated coupling of yeast peroxisomes and lipid bodies. They were able to demonstrate that yeast utilizes both the lipolysis capabilities of the lipid bodies and the oxidative apparatus of the peroxisomes in its normal metabolism of lipids. They have reported this work in J. Cell Biol. 2006, 173, 719 (Abstract).

Mannose-6-phosphate modification in lysosomal proteins (2006/6/27)

David E Sleat, Haiyan Zheng, Meiqian Qian, and Peter Lobel at the Center for Advanced Biotechnology (UMDNJ) have used the GPM to analyze the distribution of mannose-6-phosphate modifications on lysosomal proteins. This unusual modification is used to target proteins made in the cytosol to be transported into lysosomes. They reported this work in Molecular & Cellular Proteomics 2006, 5, 686 (Abstract).

HUPO announces the end of mzData (2006/6/27)

In a recent press release, HUPO-PSI announced its intention to discontinue its mzData format for representing mass spectrometry data. In its place, a new format will be developed to merge mzXML and mzData into a common representation. Therefore, all GPM development on mzData will be frozen at its current implementation.

Chemotaxis receptor concensus methylation sites (2006/6/22)

Eduardo Perez, Haiyan Zheng, and Ann M. Stock from the UMDNJ-Robert Wood Johnson Medical School have used the GPM to study post-translational modifications of the chemotaxis receptors in Thermotoga maritima. They discovered that methylation of these important proteins occurs at different sites in T. maritima than in enterobacteria. They have reported their results in the Journal of Bacteriology, 2006, 188, 4093 (Abstract).

GPM used to analyze wheat organelle (2006/6/21)

In a collaboration between groups at UC Berkeley and the USDA's Western Regional Research Center, the GPM was used to characterize the proteins present in wheat amyloplasts. These organelles are used to synthesize starch in most plants. The results were published in the Journal of Experimental Biology, 2006, 57, 1591 (Abstract).

Mouse and zebrafish sequence updates (2006/6/20)

The files for mouse and zebrafish ENSEMBL protein sequences have been updated to the most recent version of NCBI m36 (mouse) and Zv6 (zebrafish) available from ENSEMBL build 39.

Service interuption (2006/6/15)

Maintenance work in one of our data centres will result in some servers being unavailable during the hours of 16:00 to 20:00 CDT on June 15.

OSA1, release 4 now available (2006/6/13)

All of the GPM servers that provide access to the O. sativa have been updated to the TIGR release 4.0 of the proteome. For more information on this release, please check the TIGR rice genome web site.

TAIR 6 now available (2006/6/9)

All of the GPM servers that provide access to the A. thaliana have been updated to the TAIR version 6.0 of the proteome. This replaces the TIGR version 5.0, which has been available since the inception of the GPM. For more information on this release, please check the TAIR web site.

New versions of X! series search engines available (2006/5/26)

The 2006.06.01 versions of the X! series protein identification search engines are now available at our ftp site. All three search engines (X! Tandem, X! P3 and X! Hunter) have been updated to fix a problem with data obtained from a variant of mzXML spectrum files that do not contain information about a spectrum's parent ion charge. In previous versions, the search was performed correctly, but there were circumstances in which some spectra would not be displayed properly using the GPM interface software. Thanks to Paul Taylor for pointing out this problem.

X! Tandem used for novel gene detection (2006/5/15)

David States and his colleagues at the University of Michigan have developed a method using X! Tandem to discover novel genes using proteomics data. They have published their results in a study entitled "Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics" in the open access journal Genome Biology.

GPM used to understand plant resistance to insects (2006/5/15)

Brett Phinney and collaborators recently used some of the unique features of the GPM to discover a previously unknown mechanism by which plants defend themselves against insect herbivores. The resulting paper "Jasmonate-inducible plant enzymes degrade essential amino acids in the herbivore midgut" was published as a featured article PNAS.

Launch of new FTP site (2006/5/1)

In response to a number of suggestions made by users and contributors, we have updated and rationalized our FTP site and software distribution system. The new FTP site is organized into the following main folders:

  • data - contains mass spectra and collections of identifications;
  • fasta - contains the current versions of FASTA and .pro sequence files used by the public version of the GPM;
  • projects - contains source code release distributions for GPM-related projects;
  • proteotypic_peptide_profiles - contains FASTA files with lists of the peptides normally observed in proteomics; and
  • repos - contains the current contents of the GPM Subversion Source code repository.
  • We have also updated our Subversion source code repository to a new version and a new server. If you already have the Subversion client installed, you will have to "check out" the code again: simply updating the existing copy will not work properly. Simply change directories into where you wish to install the new repository and type the following line:

    svn co http://source.thegpm.org/repos

    This should create a new copy of the repository on your computer.

    Source code repository maintenance (2006/4/29)

    In an effort to improve service, we will be doing some maintenance work on the GPM Subversion code repository, April 29 - May 1, 2006. The repository will be unavailable during this period. The contents of the repository have been made available on our ftp site, ftp://ftp.thegpm.org/repos.

    X! Hunter now available (2006/4/18)

    A version of the X! Hunter spectrum matching algorithm is now available, written in the same style and using the same interface as X! Tandem. The source code for Windows, Linux and OS X are available, as well as the annotated spectrum libraries, from ftp://ftp.thegpm.org/projects/xhunter.

    This version of X! Hunter compares experimentally observed spectra to annotated libraries of averaged peptide spectra, obtained from GPMDB. Libraries are available for human, brewer's yeast, mouse and thale cress.

    If you would like to try this updated version, an experimental server has been set up at h201.thegpm.org.

    New version of X! Tandem available (2006/4/18)

    A new version of X! Tandem (2006.04.01.2) is now completely tested and available. Most of the changes are associated with extending options available through the applications' user interface. This version also brings together the code to create X! Tandem and the proteotypic peptide profiling accelerated engine X! P3.

    Denial of Service Attack (2006/2/27)

    GPMDB experienced a malicious Denial of Service attack (DOS explained) over the weekend, which made contacting the server difficult. We are in the process of ensuring that it doesn't happen again, but there may be some short periods of service interuption for the next day or two. No damage was caused by the attack: it only affects the availability of a web server for external requests.

    New version of X! Tandem available (2006/2/13)

    The latest version of X! Tandem (2006.2.01) is now available for download from the GPM FTP site. The new version is a maintenance release: the changes from the previous release are minor and meant to improve performance and consistency, rather than to add new features.

    Final recommendations of the Paris Committee (2006/2/12)

    Last year, a committee composed of members of the editorial boards of the major proteomics journals met in Paris to discuss what types of information should be required for the publication of proteomics results. The meeting and its goals were described in a recent JPR editorial. The final version of these recommendations is available here. This report is part of an ongoing process of collaboration between the journals, with the intent of keeping these recommendations up-to-date as the techology and practice of proteomics evolves.

    New GPMDB site launch (2005/12/13)

    Thanks to David Fenyö and the NIH National Research Resource Center at Rockefeller University, we have been able to upgrade the capabilities of the GPMDB server system. The new system features some improved navigation and search pages as well as an improved system architecture to make adding additional servers easier (NIH Research Resource grant RR00862).

    Sequence updates (2005/12/13)

    We have updated some of the proteome sequence files, to reflect new data from our primary sequence sources. These updates are as follows:

    1. ENSEMBL Bos taurus has been updated to the BTAU 2.0 version of the genome (this is a significantly better translation than the previous BTAU 1.0);
    2. ENSEMBL Gallus gallus has been updated to a better build of WASHUC1; and
    3. SGD S. cerevisiae has been updated to the Dec. 2005 build, which has changes to several genes.

    New release of X! Tandem (2005/12/01)

    A new maintenance release of X! Tandem (2005.12.01) is available from the FTP site. This revision was made to maintain compatibility with the evolving XML standards for representing mass spectra, as well as to add one new protein cleavage type. This new version supports the "msRun" variant of the mzXML, as well as three variants of mzData's specification for parent ion charges.

    1. An improved handling of hex encoded binary information in mzXML and mzData files, for 64-bit processors, and an improved system for detecting XML file types added by Steven Wiley (VLST Corp.).
    2. Addition of testing for N-terminal glutamic acid cyclization, suggested by Oleg Krohkin (Manitoba Centre for Proteomics and Systems Biology).
    3. Addition of "semi" enzymatic cleavage (specific enzyme cleavage at one end of a peptide and non-specific cleavage at the other), suggested by Matt Monroe (PNNL).
    4. Support for variant methods of expressing parent ion charge in mzData v. 1.05, added by Fredrik Levander (University of Lund).

    New tool from proteomecommons.org (2005/11/28)

    The busy folks at the University of Michigan have created an interesting tool that uses information gathered in GPMDB to improve the confidence of their protein assignments. In their words:

    A new tool has been added to the ProteomeCommons.org collection. This tool will take a protein id and look up the peptides you'd expect to identify for that protein using GPMDB, i.e. ask what have others found. You can then restrict the list of known peptides by a given mass range. Optionally you can add in peptides from the protein's tryptic digest or you can modify peptides with known amino-acid modifications or you can add any arbitrary mass shift. When you are all done the tool will create a plain-text file of the peptide's masses for inclusion in a MSMS analysis.

    You can retrieve the tools and get more information from the project homepage at proteomecommons.org

    Overall system updates (2005/11/3)

    We've had a busy month, updating our servers and adding new features to the GPM. As GPMDB gets closer and closer to the 10,000,000 peptides-assigned mark, we have been trying to keep up with new information sources that have become available. Two of the new services available for Homo sapiens proteins are the Human Protein Atlas and the Haplotype Mapping Project .

    The Human Protein Atlas contains annotated photomicrographs showing immunologically stained tissue sections from a large set of healthy and diseased human tissues. The goal of the project is to produce protein expression information for all of the genes in the human genome. Currently, they have a full set of data for approximately 1000 genes.

    The International HapMap Project is a survey of the differences in haplotype for a cross-section of the human population (click here for their explanation of the project). It has amassed a large amount of useful information about variations in the human genome.

    We have also just added a new server for Mus musculus searches, similar to those already in place for other species. It can be accessed at mouse.thegpm.org. This computer is also the first 64-bit server in the GPM system. We plan to have upgraded all of our search engine systems to 64-bit processors by the end of February, 2006.

    X! Tandem update available (2005/10/19)

    Thanks to the tireless efforts of our testers , several problems with the 2005.10.01.3 release version of X! Tandem have been corrected. The chief problem was that under some rare circumstances, incorrect assignments of modified peptides could be made, if a particular peptide had a very large number of residues that could be modified. We'd particularly like to thank to Achim Treumann, at the Royal College of Surgeons in Ireland, who first noticed this issue.

    The release versions of the GPM and X! Tandem for all platforms have been updated to the 2005.10.01.5 version of X! Tandem. Our apologies for any inconvenience this may have caused. This problem did not affect P3, or any of our other projects.

    X! Tandem available on Biowulf (2005/10/6)

    The Biowulf MPI cluster at the NIH has added X! Tandem as an application for NIH users. This large cluster (2400 Opteron, Xeon, and XP/Athlon processors with an aggregate floating-point performance of 10 TFLOPS) is used for bioinformatics calculations.

    New releases of X! Tandem, the GPM and GPMDB available (2005/10/5)

    New releases of the X! Tandem, the GPM and GPMDB are now available from ftp.thegpm.org. These new releases contain all of the new features and fixes that have been added since the 2005/06/15 release, including:

    1. GO annotation diagrams;
    2. improved potential modification searching;
    3. PRIDE 2.0 XML compatibility;
    4. protein "intersection" searches; and
    5. multi-window species selection.

    In addition, a new service pack for existing GPM-USB devices is available. Once the service pack is installed, it is now possible to configure these devices as full web servers. A CD-installable version of the GPM is also available, for educational and laboratory use.

    GPMDB-US comes on-line (2005/09/25)

    GPMDB, our proteomics data repository and experiment validation database, has broadened its connectivity with the addition of a sister site, GPMDB-US. This site contains all of the information in GPMDB and it is located at Rockefeller University, in the Mass Spectrometry and Gaseous Ion Chemistry Laboratory headed by Brian Chait. David Fenyö has taken on the task of setting up and maintaining the servers. This site will receive daily updates of information gathered by GPM. We would like to thank the National Institutes of Health National Centers for Research Resources program for providing the funding that made this new site possible.

    New look for the GPM (2005/09/14)

    We are in the final stages of putting together the October release of the GPM. As a preview, the public GPM sites will be converted over to the new interface style over the next few days. These changes include:

    1. Two taxon entry panes, one with eukaryote proteomes and the other with prokaryotes. The normal eukaryote sites will have a selection of prokaryotes, while the dedicated prokaryote site will have all of the prokaryotes that NCBI provides. Remember that you can select as many entries as you like from either pane.
    2. The ability to select which set of fragment ion series (a, b, c, x, y, or z, on the Advanced search page) you would like to use for your search. Previously, this had been fixed to only b & y ions.
    3. You may select to use either monoisotopic or average fragment ion masses for a search (Advanced search page).
    4. Addition of Apis mellifera (domestic honey bee), Bos taurus (domestic cow) and Silurana tropicalis (African clawed frog) to the normal eukaryote sites. Silurana tropicalis is a close relative of Xenopus laevis, previously known as Xenopus tropicalis.

    A more detailed description of the changes to X! Tandem that allow some of these new features will be made available, once the code is ready for release.

    GPMDB Maintenance (2005/09/13)

    GPMDB will be taken off line for maintenance at 6:00 PM on Sept 13, 2005 and brought back up by 9:00 AM Sept. 14, 2005. We are performing some maintenance and testing necessary to bring a new mirror site at Rockefeller University on line.

    Peptide spectrum library searches (2005/09/10)

    A new GPM application, X! Hunter, has reached to point where it is ready for public testing. X! Hunter is a different style of peptide identification search engine. Rather than predicting spectra from a peptide sequence, it directly compares an input spectrum to a library of spectra that have been confidently assigned to a particular peptide sequence. This type of pattern matching tool is ideal for applications such as biomarker discovery, molecular scanners and instrument control, where obtaining a confident match for a single spectrum quickly is important.

    Using spectrum libraries is not at all new: this type of pattern matching strategy has been used in all forms of analytical spectroscopy (including mass spectrometry) since the 1950's. The only reason it hasn't been applied to peptide mass spectra is the obvious difficulty of obtaining exemplar spectra for all of the possible peptides in a proteome.

    Fortunately, we happen to have a database of nine million examples, GPMDB. To create the libraries for X! Hunter, all of the confident assignments for human and yeast peptides were extracted from GPMDB. Then spectra that were replicate observations of the same peptide were averaged together and a final list of about 110,000 averaged peptide spectra was produced.

    Please give X! Hunter a try (there are several examples). Let us know what you think.

    Experiments with Gene Ontology (2005/08/22)

    Selected Gene Ontology (GO) terms have be selected as a permanent part of the GPM display structure. On the top of model listing pages for ENSEMBL human and SGD yeast sequences, a new link to the "GO" page is now available. You can view histograms or pie charts of your data, classified according to the ENSEMBL GO annotations. For example:

    1. GPM10100001010, human sample, histogram
    2. GPM06600002542, yeast sample, pie chart

    Communication/cross-posting with PRIDE (2005/08/22)

    The European Bioinformatics Institute's entry into the proteomics repository field, the PRoteomics IDEntification database (PRIDE), has recently been upgraded. It is now possible to interchange data between GPMDB and PRIDE, using their newly defined PRIDE 2.0 XML, which can be easily generated from GPMDB's BIOML data files. We are beginning to transfer selected information into PRIDE, which can be accessed through the PRIDE experiment number query interface. The initial entries from GPMDB can be accessed by PRIDE_EXP:0000108 to PRIDE_EXP:0001620.

    New version of X! Tandem available (2005/08/16)

    A new version of X! Tandem (v. 2005.08.15.3) has been released that adds some new features and improves on some older ones. We would like to thank the following contributors:

    1. Brendan Maclean (Fred Hutchinson Cancer Research Center) for improving the internal consistency of high accuracy mass calculations;
    2. Patrick Lacasse (Laval University) for suggesting a mechanism to force the selection of a given file format, even if it does not meet the requirements for automatic detection;
    3. Rob Craig (Beavis Informatics) for completing the conversion of the older, custom XML handlers into ExPat-compatible handlers; and
    4. Torsten Schwede and Michael Podvinec (Biozentrum, University of Basel) for tracking down a memory access issue that resulted in stability problems when X! Tandem was deployed across a PC Grid system.

    Further Indexing by Google (2005/08/16)

    In addition to the earlier indexing, Google has begun indexing individual results in the GPMDB. Google queries such as:

    • "gpmdb clathrin" (protein keyword);
    • "gpmdb SNEEGSEEKGPEVR" (tryptic peptide sequence);
    • "gpmdb GPM87400000110" (GPM ID number); or
    • "gpmdb apolipoprotein haptoglobin" (multiple keywords)

    all return results now. This facility should make it easy for users to quickly enter into the GPMDB to find their own data, as well as to cross-reference their results with those obtained by other researchers.

    Bos taurus ENSEMBL genome available (2005/08/02)

    ENSEMBL has recently added the annotation of Btau 1.0 to its site. We have updated the B. taurus GPM site to include this new information.

    New Human Plasma Data Available (2005/08/02)

    Dick Smith's group at Pacific Northwestern National Laboratories have kindly made a large set of measurements on human plasma available to GPMDB. These measurements are a strong supplement to the Human Plasma Proteome data deposited by Gil Omen's HUPO team earlier this year.

    The results can be accessed individually (they are numbered sequentially) from

    GPM10100000612 - GPM10100001201

    GPM Disruption (2005/07/17)

    After recovering gracefully from the power disruption last week, some parts of the GPM were knocked out by a large thunderstorm in Winnipeg on Sunday morning. Thanks to Shawn Walbridge of SynAck Hosting for his repairs to the system.

    GPM Maintenance Service Disruption (2005/07/08)

    Scheduled maintenance of the power system at one of the two main GPM data centres will occur between 18:00 and 19:00 (CDT) on Sunday July 10, 2005. It is possible that some service disruption will occur. We will try and get everything back up and running smoothly as quickly as possible.

    S. pombe and T. annulata added to GPM (2005/07/08)

    The proteome of the fission yeast S. pombe has been added to the species list for the eukaryote dedicated mirrors of GPM. These sequences link through to GeneDB as the primary source of sequence information. Also from GeneDB, the tick-borne cattle parasite Theileria annulata, has been added to the protista site.

    Two new cluster versions of X! Tandem (2005/06/28)

    We are very happy to announce the release of two new clustering interfaces for X! Tandem, designed and implemented by Andy Link's group at Vanderbilt University. These interfaces use the popular Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM) standards to tie together multiple computers to allow a single X! Tandem job execute on multiple computers. Initial documentation about the project can be found here and the code found at our ftp site. The details of the project have been accepted for publication in the Journal of Proteome Research.

    A new service pack release of GPM-USB (2005/06/28)

    For those people who have purchased a GPM-USB device from Beavis Informatics, a new service pack (2005.07.01) has been released. To update your system, click here and follow the instructions. The service pack includes a number of updates, including:

    1. integrated P3 support;
    2. support for custom amino acid residue mass definitions;
    3. numerous upgrades to display scripts; and
    4. the most recent version of GPM Manager.

    Bos taurus (domestic cow) now has its own site (2005/06/28)

    Due to popular demand, a site dedicated to B. taurus has been constructed. The bovine genome has not yet been entered into the ENSEMBL system, so the proteome sequences are derived from the latest version of the genome held at NCBI. When the ENSEMBL system is available, the site will be updated to include the more informative genome links.

    Aurum data added to GPMDB(2005/06/16)

    The Aurum data collection has been analyzed and imported into GPMDB. This data set was produced from recombinant human proteins and can be used as a set of high-quality examples of peptide spectra from the ABI 4700 TOF-TOF instrument. The results, by plate number, are as follows: T10467; T10475; T10622; T10445; T10707; T10739; and T10761.

    A new release of X! Tandem and P3(2005/06/03)

    The 2005.06.01.2 release of X! Tandem and P3 is now available. This new release brings the code base for the two projects much closer together, adding the ability to read MSDATA and MSXML files to P3. It also corrects an issue pointed out by Phillip Wilmarth at OHSU, that could result in some incorrect protein expectation values in very large MudPIT datasets with large numbers of redundant identifications.

    GPMDB has been googled (2005/06/02)

    The popular web server indexing service Google has indexed a large portion of the GPMDB data collection. Querying Google with protein id number (such as an ENSEMBL id number) will now produce links in to GPMDB results for that protein. Thanks to Google for providing this additional indexing for us.

    GPMDB peptide count jumps to over 6.5 million (2005/05/03)

    As of today, the number of annotated peptides in GPMDB has reached 6,613,809. Detailed statistics can be found here. The addition of a statistics archive link enables users to browse previous summaries and watch the GPMDB progress.

    GPMDB adds HUPO PPP results (2005/4/14)

    The GPMDB has added a special range of model accession numbers for the results generated by the Human Proteome Organization Plasma Proteome Project. The first set of 611 results, obtained by analyzing publically available data from the PPP web site, has been made available. The results can be accessed by GPM number, in the range GPM10100000001 to GPM10100000611. We would like to thank David States and Gil Omenn for their cooperation and for allowing us to add this data to the GPMDB.

    Xenopus sp. site added (2004.11.22)

    In response to a request, we have added a new site xenopus.thegpm.org with a set of sequence resources dedicated to the genus Xenopus. It includes the most recent builds of UNIGENE for two Xenopus species (laevis and tropicalis) as well as the nr sequences for the subfamily Xenopodinae.

    New features added to GPM (2004/11/12)

    The public GPM interface has been updated to allow users to customize their results and to use some of the data clustering ideas that we have developed. The new features will be become available in a release of the open source installation versions on Nov. 22. These changes include:

    1. Addition of spectrum prefiltering to remove repeated spectra from the initial set of mass spectra. This feature compares spectra using a dot product calculation and removes spectra that have vector representations that point in the same direction. The most intense spectrum out of a set of repeated spectra is kept and used for analysis. This type of filtering can remove up to 90% of spectra from a MudPit-style run, making data analysis and interpretation easier.
    2. The protein listing and display pages can be customized to limit the proteins displayed to those with expectation values better than a value set by the user. This feature can be used to simplify reports.
    3. A pseudo-HPLC display has been added, which graphically illustrates the intensity vs. retention time plot expected given the peptide sequences discovered and the relative intensity of the MS/MS spectra. The retention times are calculated using the algorithm described in Reference 9.
    4. Dot product calculations have been added to the spectrum validation routine used by GPMDB to show the best match to a given spectrum-to-sequence assignment. This new routine orders the exemplar spectra drawn from GPMDB on the basis of similarity to the spectrum that is to be validated. Previous versions of this routine simply listed the best spectra (based on expectation value).
    5. A clustering feature has been added to the protein detailed display page, which allows the user to hide repeated peptide sequences, if desired.

    GPMDB peptide count breaks 2 million mark (2004/09/28)

    As of today, the number of indexed peptides in GPMDB has reached 2,010,819. Detailed statistics can be found here. The addition of a statistics archive link enables users to browse previous summaries and watch the GPMDB progress.

    GPMO launches message board (2004/09/28)

    Visit the message board and post questions, comments, experiences or developments with GPMO software. Feel free to help others in the community by sharing your knowledge of GPMO applications. It's quick and easy to join so sign up today!

    New members added to GPM Scientific Advisory Board (2004/09/03)

    We are pleased to announce that Brian T. Chait (Rockefeller University), David Fenyö (GE Healthcare) and Stephen B.H. Kent (University of Chicago) have been named to the Scientific Advisory Board of the GPM.

    Updates to GPMDB (2004/09/03)

    GPMDB is the publically available index to all of the data that has come in through the GPM's various interfaces. As of today, it has 1.6x106 annotated MS/MS spectra, although that is increasing all the time.

    It now has some improved database browsing capabilities, such as a dedicated keyword searching interface and a multiple accession number search interface. It has evidence for more than 3,300 yeast ORFs and 10,700 ENSEMBL human protein ids. We have a manuscript describing the technical features of the database, as well as some use cases for answering questions with the system. If you'd like a copy, please us, and we will send you a manuscript preprint.

    A new release of X! Tandem available (2004/09/01)

    This new release of X! Tandem corrects a number of minor problems that have been reported by users. It also adds new functionality:

    • The ability to use multiple "taxon" names in a single session. This change allows the use on multiple species selection on the GPM sites. This change is particularly important for users of the plant and prokaryote sites, where mixing and matching the sequences sources to be used can be quite handy.
    • Extension of the scoring model to improve scoring for parent ions with z > 2.
    We'd like to thank Jimmy Eng and Mike Knierman for pointing out specific problem spectra that helped a lot in improving the code.

    This release is available on the ftp site, but it is the first release that is also available through our new code repository. We are now using the Subversion system for code revision control, to co-ordinate our various code projects. A new release of the GPM site installation is in preparation: it should be available on Sept. 10, 2004.

    Major new release of X! Tandem available (2004/7/15)

    X! TANDEM marks it first year by the release of X! TANDEM 2. Version 2 features improved memory management, fast execution, and better use of multiprocessor machines. It also has built-in reversed-sequence validation method features as well as its own stochastic histogramming method validation. See the release notes for more details. Version 2 has been deployed on all of the GPM sites.

    Important Note

    When updating from previous versions of GPMO software, be sure to back up your current files. This includes result files and any of the web interface or perl script files that may have been customized for your particular installation.

    Updated versions of GPM and X! Tandem available (2004/6/1)

    New versions of GPM and X! Tandem were made available on June 1, 2004. Thanks to everyone who tested the new versions and suggested new views and features.

    The new version of GPM includes a 1D/2D PAGE gel simulation view, an improved tabular view for writing reports and a protein chip view.

    The new version of X! Tandem includes the ability to specify PROSITE-style motifs for potential modifications as well as the possibility of specifying potential modifications as having prompt neutral losses (e.g., the loss of 98 from phosphoserine or phosphothreonine).

    Two new Projects available: LiveCD and Quartz (2004/4/15)

    The GPMO has added two new projects, LiveCD and Quartz to the site. LiveCD, a project from the University of Michigan NCRR Center for Proteomics, provides a simple method to install a Linux-based version of X! TANDEM and the GPM on a large number of computers for instructional and demonstration purposes. It also includes some software allowing the use of X! TANDEM on clusters of computers running LiveCD.

    Quartz is a GPMO staff project. It is a set of annotated spectrum collections, meant to be used for bioinformatics research. The current collections contain > 2000 MS/MS spectra, along with XML-formated annotation files.

    X! TANDEM and the GPM release updates (2004/4/10)

    New releases of both X! TANDEM and the GPM were released today. This is a maintenance release, including fixes for small problems observed with previous versions. The collections of sequences for the GPM have been updated to include the latest sequence releases from ENSEMBL (1/4/2004).

    Probity model published (2004/3/1)

    The GPM takes advantage of the "Probity" statistical model to combine the results of multiple peptide identifications into an expectation value for a protein. This model, formulated by Jan Eriksson and David Fenyö, has just been published in the Journal for Proteome Research (Abstract).

    GPM and Tandem updated (2004/3/1)

    The GPM Perl scripts and Tandem code have been updated. The new scripts allow for more complete viewing of data supporting identifications, particularly the histograms that are used to perform the statistical analysis for distinguishing stochastic results from true ones. Tandem has been altered to correct a few unexpected behaviors and to improve its support for N-and C-terminal post-translational modifications.

    GPM sequences updated (2004/2/16)

    The sequences available to search have been updated to reflect the Feb. 9, 2004 release of most of the proteomes. The new sequences were downloaded from the ENSEMBL site and tested on the public installations of the GPM. The new databases are available for download from the GPM ftp site, in the "gpm_current_version" folder.

    New versions of Tandem and GPM released (2004/2/1)

    As of February 1, the 2004.02.01 versions of both the GPM and Tandem have been released. They include the updates necessary to use point mutation analysis in local installations. The GPM has been updated to include a new data view mode: "details". This new mode allows the user to examine the results at a spectrum by spectrum level, viewing all of the raw data involved, including all of the scoring histograms and spectrum peak lists.

    Over 400,000 served! (2004/1/30)

    At the end on January, the total number of spectra modelled using the public version of the GPM reached 400,000.

    The GPM identifies its 4000th gene (2004/1/27)

    After only 27 days of operation, the GPM has discovered more than 4000 individual genes, using mass spectrum sets sent in by the proteomics community. The GPM only imports information from genomic gene collections as necessary, so this high rate of discovery has meant that the Machine's cached records are improving at a rapid rate. We'd like to thank the proteomics community for using the Machine, helping it learn about this large collection of observed proteins.

    Point mutation analysis with GPM (2004/1/18)

    The GPM has been updated to include a new modeling feature in the Tandem engine. It now allows modeling of all possible point mutations in a sequence during the sequence refinement process. This new capability is still experimental: see the Tandem project's explanation of this new capability.

    Modifications have been made to some of the other report pages, in an effort to increase the amount of genomic and proteomic information made available when a valid model sequence has been found.

    Updating the GPM (2004/1/10)

    After 10 days operation, the released version of the GPM has been updated to include a set of patches to answer questions that cropped up. Thanks to the many users who used the GPM and sent in helpful suggestions, as well as those enthusiasts who actually installed their own local versions of the GPM.

    Opening the Global Proteome Machine (2004/1/1)

    As of January 1, 2004, the Global Proteome Machine has become active. It is a simple, open source interface for analyzing tandem mass spectra against eukaryote genomes. Using the GPM is free and available to anyone interested in proteomics. The initial GPM configuration has the capacity to search approximately 1010 MS/MS spectra per year.

    A new release of X! TANDEM (2005/3/21)

    This is the first release of X! TANDEM to fully support the mzXML and mzData spectrum input formats. The design and initial implementation for both formats was done by Patrick Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec), with eXpat support and refinement of the code by Brendan Maclean at Fred Hutchinson Cancer Research Center. We would also like to thank Pedrioli Patrick from the Institute for Systems Biology, who wrote the mzXML parser that Patrick used as a model for his implementation and who has allowed us to make this available under the Artistic License. It should be noted that our support for these standards, much like the standards themselves, is preliminary and there may be some "flavours" of either format that do not work as expected.

    In addition, this release has a new optional parameter that allows the user to specify a parameter file that contains masses for one or all of the amino acid residues. This feature makes it possible to use non-standard amino acids, or isotopically labelled amino acids. An example of using this feature to find proteins that were made using all 15N amino acids is available at the human boutique site.

    First release of P3 (2005/2/16)

    The X! TANDEM P3 project is the first protein identification system capable of using proteotypic peptides to accelerate searching and improve the confidence of results. The system is built out of the X! TANDEM framework and utilizes the GPM interface for its displays. The necessary proteotypic peptide libraries are continuously updated from the GPMDB for human and yeast proteomes. Proteotypic peptide libraries are much smaller than full proteomes, so this type of searching runs quite a bit quicker than standard searches.

    First release of the Jasper spectrum collection (2005/2/16)

    The Jasper spectrum collection is a new type of bioinformatics resource, made available as part of the Quartz spectrum library. Jasper collections contain the best spectrum-to-peptide assignments from the GPMDB, broken down into categories based on the reliability of the assignment (based on the measured expectation value for a peptide). These libraries of spectra are in XML files, containing the peptide sequences (with PTM's) associated with individual spectra that were assigned to those peptides. The first library contains about 64,000 high quality spectrum-to-sequence assignements.

    New release of X! Tandem available (2005/2/16)

    The 2005.02.01 release of X! Tandem is now available. The new features of this release are mainly for programmers, particularly an improved mechanism for adding in new scoring systems, elegantly added by Brendan MacLean. Some changes have also been made to take further advantage of high accuracy parent ion mass measurements.

    S. tropicalis genome sequence available(2005/2/10)

    In addition to the other sequence resources available on the Xenopus, the newly released protein predictions from the S. tropicalis genome are now available. They are annotated using information from ENSEMBL. This genome represents the first full sequence of an amphibian genome.

    X! Tandem to use mzData and mzXML input standards (2005/1/21)

    We are happy to announce that as the result of the most recent Standards in Proteomics meeting held by the NIDDK in Washington earlier this month, X! Tandem will support both MS/MS data representations, as proposed by HUPO-PSI and the Institute for Systems Biology. The development work to incorporate the two standards has begun and the finished software should be available by the end of February. Many thanks to Patrick Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec) for generating the mzXML and mzData parser classes. We would also like to thank Randy Julian (Eli Lilly) for his co-operation and help with adding mzData.

    GPMDB begins collaboration with NIST (2005/1/21)

    We are happy to announce that Dr. Steve Stein from the US National Institute of Standards and Technology is now collaborating with us to produce a standardized library of peptide MS/MS spectra to be used for the improvement of protein identification algorithms. The donated entries in GPMDB will be statistically evaluated and an "average" spectrum for a particular peptide, based on its modifications and charge state, will be developed. Dr. Stein has worked with the development of similar spectrum libraries for use with small molecule identification for many years and we are very happy to be of assistance in developing similar approaches for proteomics. Dr. Stein expects to announce the preliminary results of his work at the US-HUPO meeting this spring.

    GPM source code now mirrored on Proteome Commons (2005/1/18)

    As a result of our collaboration with the Michigan Proteomics Consortium, we are happy to announce the inclusion of all GPM software in the new proteomecommons.org open source software archive. Many thanks to Jayson Falkner, Pete Ulintz and Phil Andrews for creating this new site, which we hope will be of general value to the proteomics community.

    GPMDB peptide count breaks 4 million mark (2004/11/22)

    As of today, the number of annotated peptides in GPMDB has reached 4,121,723. Detailed statistics can be found here. The addition of a statistics archive link enables users to browse previous summaries and watch the GPMDB progress.


    Copyright © 2004, The Global Proteome Machine Organization Privacy Statement