|
|
|
The GPM wiki site opening (2007/11/07)
As an experiment in how to most effectively annotate proteomics data,
the GPM now has a dedicated wiki system integrated into its user interface. This wiki can also be accessed through
wiki.thegpm.org. Currently, the GPM interface
is linked to the wiki on the level of GPM accession number, protein accession number
and individual peptide sequences.
|
|
Sequence updates (2007/11/03)
The sequences used for human and mouse have been updated to ENSEMBL v.47. Mouse
now uses NCBI m37, the mose recent version of the mouse genome. The single nucleotide
induced amino acid polymorphisms listings have also been updated to reflect the
changes in these new sequence collections. The sequences used for rice have also
been updated, to use OSA1r5 from the J. Craig Ventner Institute (JCVI, the new name for TIGR).
Changes in the way that JCVI refers to sequences has led to a change in the style of
accession numbers being used for rice: rather than the feature index, the locus accession
is now being used.
|
|
Temporary service interuption (2007/10/26)
Between approximately 11:00 to 13:00 PDT on Oct. 26th, a number of the
GPM search servers will be unavailable. This interuption is necessary to perform
some much needed systems maintenance.
|
|
The "Global" in Global Proteome Machine (2007/10/17)
In order to better understand how the GPM system is being used, we have begun to use
Google Analytics to generate statistics on
the use of GPM generated searches and GPMDB database information retrieval. We will make this
information available on a monthly basis. The first month's data is available as a
PDF file. The current report shows what locations
in the world are using GPM and approximately how many pages are being downloaded per user visit.
|
|
New libraries for X! Hunter (2007/10/11)
The annotated spectrum libraries for X! Hunter have been updated, with
a significant expansion of sequence coverage for most species (see the new
statistics here). Libraries for P. troglodytes (chimp)
and Felis catus (house cat) have been added to the eukaryote species collections.
|
|
New species added to X! Hunter (2007/09/02)
The X! Hunter Annotated Spectrum Libraries have been updated to include
a number of prokaryote species, based on new data submitted to GPMDB. The following
species are now available for high-speed searching:
- Deinococcus radiodurans
- Escherichia coli
- Halobacterium sp.
- Mycobacterium smegmatis
- Mycobacterium tuberculosis
- Salmonella enterica
- Salmonella typhi
- Salmonella typhimurium
- Shewanella oneidensis
- Streptococcus pyogenes
|
|
Opening of the GPMDB MS/MS repository (2007/09/01)
The GPM Database has become the largest source of publically accessible data through
the donation of data from laboratories from around the world. In an effort to make that
service more comprehensive, we have added a new feature to the public GPM sites that
can create a highly compressed version of all of the original MS/MS data files submitted for analysis.
If the results will be made available in GPMDB, the compressed MS/MS data file will
now be archived and made available using the CMN 1.0 data format.
The total contents of the archive will available at ftp://ftp.thegpm.org/data/msms.
These files will be named in the same manner as GPM data models, for example the data
model accession number "GPM00300001111" will have model file named "GPM00300001111.xml"
and an archived data file named "GPM00300001111.cmn".
This archive is organized into separate folders, corresponding to the first three numbers in the GPM accession number.
|
|
New release of X! series search engines (2007/07/01)
The X! series search engines (Tandem, P3 and Hunter) have been updated to include
compatibility with some variants of mzXML and mzData spectrum input files, which use
64-bit floating point numbers for fragment ion mass and intensity information. X! Hunter has
also been updated to include a new format (see the definition)
for the input annotated spectrum libraries that
is a suggested standard format for the exchange of this type of information.
|
|
GPM Adopts Cell and Tissue Ontologies (2007/06/28)
In an effort to increase the utility of the GPM and GPMDB, the public sites have
been updated to include an interface allowing researchers to include more information
about their experiments. This information is organized around current "ontology"
projects, which supply standard lists of relevant biological terms linked to accession numbers.
The ontologies were chosen to provide as much consistency as possible between GPMDB and PRIDE.
- Gene Ontology (GO):
the GO Slim list of terms associated with cellular localization;
- Cell Type Ontology (CELL):
a fairly comprehensive collection of eukaryote cell types; and
- BRENDA Tissue Ontology (BRENDA):
the BRENDA tissue list has been broken down into cell lines and tissues normally found in an organism.
|
|
New X! Hunter ASLs released (2007/05/20)
The 2007.05.15 version of the GPM Annotated Spectrum Libraries for X! Hunter
are now available for download from the GPM FTP site.
The new library was compiled using a new curation process that was designed to reduce
the number of potential false positive entries in the library. The list of allowed
sequence modifications was expanded to include
- ICAT (both classic and cleavable);
- ITRAC;
- S/T/Y phosphorylation; and
- Q/N deamidation
The new libraries also include HLF X! Hunter files, MGF spectrum files and FASTA peptide files
for use in bioinformatics research.
|
|
Milestone reached (2007/05/09)
GPMDB added its 25,000,000th peptide identification over
the weekend. We would like to thank all of the individual data contributors, as well as
the team at the PeptideAtlas repository, for making this possible.
|
|
System outage (2007/04/30)
GPMDB will be unavailable for several hours on the afternoon of April 30, 2007 for
system maintenance.
|
|
System updates (2007/04/15)
A number of updates/upgrades have been performed on the overall GPM system.
- The human, mouse and rat proteomes have been updated to the latest version from ENSEMBL (v. 43)
- The 2007.04.01 versions of X! Tandem and P3 have been deployed. This release adds the capability
of checking for known single amino acid polymorphisms (SAPs). The known annotations are
based on the dbSNP and ENSEMBL SNP databases for coding, non-synonymous SNPs. The annotation
files are available from the GPM FTP site. This capability
has been made the default behavior for searching human, mouse and rat ENSEMBL proteomes.
- The frog and fish
boutique sites have been moved to 8 core computer platforms.
|
|
New equipment for boutique proteomes (2007/03/10)
The servers being used for the cow,
mouse,
rat,
plant, and
prokaryote
boutique sequence sites have been upgraded to the same type
of dual quad-core processor based computers as the new human site. The new servers
are a generous gift of the Biomedical Research Centre at the University of British
Columbia. We'd like to thank John Wilkins group at the
University of Manitoba, who donated the equipment to host these sites for
the last two years.
|
|
Human Invitational proteome updated (2007/03/08)
The Human Invitational Database is a collection of highly curated RNA sequences
meant to track the existence of splice variants and unanticipated translations of human
genes. We have always made this sequence collection available through the human
boutique search server and have updated this set of sequences to version
H-InvDB_3.8.
|
|
Cat and Guinea pig proteomes added (2007/03/08)
The predicted sequences of the cat (Felis catus) and Guinea pig (Cavia porcellus) proteomes
have been added to the main servers of the GPM. These sequences were
obtained from the ENSEMBL CAT build 43.1 and
ENSEMBL cavPor2 build 43.1.
These are low coverage 2X assemblies, so the underlying gene models are
expected to change with time. For comparison, these proteomes contain approximately 13,000 more protein
sequences for each species than are available in NCBI's nr.
|
|
Rabbit proteome added (2007/03/07)
The predicted sequences of the rabbit (Oryctolagus cuniculus) proteome
have been added to the main servers of the GPM. These sequences were
obtained from the ENSEMBL RABBIT build 43.1b.
This is a low coverage 2X assembly, so the underlying gene models are
expected to change with time. For comparison, this proteome contains approximately 10,000 more rabbit protein
sequences than are available in NCBI's nr.
|
|
Equipment upgrade (2007/02/28)
On Saturday (2007/02/24) we upgraded the search servers human,
h066, and h112 to dual processor (Intel XEON E5345, 2.2 GHz),
quad-core computers, improving search speed performance to be about three times faster than the fastest other
computers we have in the system. Several more of these relatively high speed computers have
been ordered and they should be installed within a few weeks.
An error in one of the configuration files on the new "human" server has caused any searches
performed on that server using the IPI, SWISS-PROT, UNIGENE or HIT sequences sets to be incomplete: X! Tandem
was unable to access the appropriate sequence files. The problem has been corrected, but any searches performed
on these sequence sets since Saturday should be repeated. The same problem affected all searches performed using
X! Hunter (the "Feeling lucky" button).
|
|
New versions of X! series search engines released (2007/01/31)
New versions of X! Tandem, P3 and Hunter are now available at the GPM ftp site.
This release fixes up a few small issues associated with operating system compatibility,
some new information generated from the data in GPMDB and adds some new information to the output data files
that can be used for quality control purposes. It also includes compiled versions for the Mac OS 10.4
for Intel-based Macs.
|
|
Prokaryote sequences updated (2006/12/08)
The prokaryote search site has
been updated from the NCBI site, including updated sequences for many common
prokaryotes, as well as some new species, such as Mycobacterium smegmatus.
These sequences have been added to those available at the GPM prokaryote sequence ftp site.
|
|
New site for ABRF PRG search (2006/11/30)
At the request of Brett Phinney, we have set up a site specifically for
use with ABRF Proteomics Research Group (PRG).
This site uses the protein sequences involved in the PRG 2007 study (prg.fasta.gz).
You can find the site at http://prg.thegpm.org.
|
|
Weather problems (2006/11/27)
Because of a severe winter storm, the GPM servers located in Vancouver BC were unavailable for most of
the day, November 27, 2006. This was caused by a general power failure at the University of British Columbia
that shut down the University's computer network.
|
|
Addition of the Universal Protein Standard to cRAP (2006/11/19)
The common Respository of Adventious Proteins (cRAP) has been updated to include the UniProt
sequences corresponding to the Sigma-Aldrich Universal Protein Standard UPS1 set of proteins.
The accession numbers and identity of the proteins are listed on the cRAP
project page. All of the GPM public identification servers have been updated to reflect these changes. The
main cRAP FASTA file as well as a separate file containing only the UPS1 sequences can be
obtained from the GPM FTP site.
|
|
Updates for X! Series search engines
(2006/10/25)
Minor updates to the 2006.09.15 release of the X! Series search engines are now
available. These changes improve X! Tandem and P3 compatibility with ECD/ETD spectra. We'd
like to thank Brett Phinney for supplying the necessary experimental data that has
allowed us to improve the results for these spectra. New versions of peptide.pl and
peptide_studio.pl are available that properly markup the c and z ions generated by
these ion sources.
|
|
System maintenance service interuptions (2006/10/16)
Because of planned maintenance in one of our data centres, some servers may be difficult
to reach on the afternoon of Oct. 16 and the morning of Oct. 17. The affected servers are:
- human.thegpm.org;
- mouse.thegpm.org;
- protista.thegpm.org;
- h451.thegpm.org;
- ppp.thegpm.org;
- xhunter.thegpm.org; and
- rat.thegpm.org
|
|
Updates for X! Series search engines
(2006/9/17)
New versions of the X! Series search engines have been released, all with
the version number 2006.09.15. These changes have been consolidated into a
new release of the GPM-XE system.
- X! Tandem and X! P3. No changes have been made to the functionality
of the search engines. The changes made to these projects are to improve the
cross-platform compatibility of the projects and to conform to the latest
security updates from Microsoft for the Windows versions.
- X! Hunter. Changes have been made to the scoring model, incorporating
information about the original assignment confidence of a particular library
spectrum to its associated peptide. Several other changes have been
made to improve memory usage and overall execution speed.
- X! Hunter ASL creation/curation system. We have released the full system
that we use internally to create the Annotated Spectrum Libraries used by
X! Hunter. This system can be used to generate a custom ASL library from
any GPMDB installation. Please refer to the installation documentation
for any site-specific requirements for this release.
- X! Hunter ASL file format. The format for the ASL
library files has been updated to add the information necessary for the change
in the scoring model. The new format is defined here.
|
|
A new role for AP2-gamma
(2006/9/11)
Harry and Mary Lynn Duckworth and their collaborators at the University of Manitoba
used the GPM to provide the first direct evidence for the involvement of a placental-specific
transcription factor in the regulation of a member of this gene family.
They reported the work in Endocrinology 2006, 147, 4319.
(Full Text).
|
|
System outage
(2006/9/11)
Because of a hacker breakin, the servers h451.thegpm.org, mouse.thegpm.org,
rat.thegpm.org and xhunter.thegpm.org will be out of service for most of today as
we reinstall software and make the necessary security adjustments.
|
|
PepSeeker links added
(2006/7/19)
A multidisciplinary group at the University of Manchester, lead by
Simon J. Gaskell and Simon J. Hubbard, have developed a new proteomics database
called PepSeeker. This database was designed mainly to aid in the
development of new theoretical understanding of gas phase peptide chemistry
(see Abstract).
GPM spectrum display pages now have links that allow the user to see the evidence
in Pepseeker for the peptide being displayed.
|
|
X! Hunter annotated spectrum library paper published
(2006/7/18)
A manuscript describing X! Hunter Annotated Spectrum Library (ASL) searches has been published
on the ASAP section of the Journal of Proteome Research (Abstract).
This paper describes the details of how the ASLs are compiled from the data in the GPMDB and
explains the architecture of the underlying informatics. An example contrasting ASL and
conventional protein identification demonstrates some of the unique features of this new type of
proteomics technique.
|
|
Addition of Neurospora crassa
(2006/7/18)
The model organism GPM search sites have
been updated to include the bread mould Neurospora crassa OR74A.
The sequences correspond to the Entrez May 2006 version of the genome. The sequence files
(n_crassa.fasta and n_crassa.fasta.pro) are available from the GPM
ftp site.
|
|
Change in honey bee sequences
(2006/7/10)
ENSEMBL has dropped honey bee from its list of supported species. Since
the GPM search sites use honey bee ENSEMBL accessions, we had switched to
the NCBI version of the honey bee genome. The new honey bee sequence files
(bee_e.fasta.pro.gz and
bee_e.fasta.gz)
are available from our FTP site. All of our search sites have
been updated to use the new sequence set.
|
|
Peroxisomes role in yeast lipid metabolism
(2006/7/5)
Joel M. Goodman and his collaborators at the University of Texas Southwestern Medical School
and University of North Texas used X! Tandem to demonstrate the previously unappreciated
coupling of yeast peroxisomes and lipid bodies. They were able to demonstrate that yeast utilizes
both the lipolysis capabilities of the lipid bodies and the oxidative apparatus of the peroxisomes
in its normal metabolism of lipids. They have reported this work in J. Cell Biol. 2006, 173, 719
(Abstract).
|
|
Mannose-6-phosphate modification in lysosomal proteins
(2006/6/27)
David E Sleat, Haiyan Zheng, Meiqian Qian, and Peter Lobel at the
Center for Advanced Biotechnology (UMDNJ) have used the GPM
to analyze the distribution of mannose-6-phosphate modifications
on lysosomal proteins. This unusual modification is used to
target proteins made in the cytosol to be transported into
lysosomes. They reported this work in Molecular & Cellular Proteomics 2006, 5, 686 (Abstract).
|
|
HUPO announces the end of mzData
(2006/6/27)
In a recent press release, HUPO-PSI announced its intention to
discontinue its mzData format for representing mass spectrometry data. In its place, a new format will be developed to merge mzXML and mzData into
a common representation. Therefore, all GPM development on mzData will be frozen at its current implementation.
|
|
Chemotaxis receptor concensus methylation sites
(2006/6/22)
Eduardo Perez, Haiyan Zheng, and Ann M. Stock from the UMDNJ-Robert Wood Johnson
Medical School have used the GPM to study post-translational modifications of
the chemotaxis receptors in Thermotoga maritima. They discovered that
methylation of these important proteins occurs at different sites in T. maritima
than in enterobacteria. They have reported their results in the Journal of
Bacteriology, 2006, 188, 4093 (Abstract).
|
|
GPM used to analyze wheat organelle (2006/6/21)
In a collaboration between groups at UC Berkeley and the USDA's Western Regional
Research Center, the GPM was used to characterize the proteins present in wheat
amyloplasts. These organelles are used to synthesize starch in most plants. The
results were published in the Journal of Experimental Biology, 2006, 57, 1591
(Abstract).
|
|
Mouse and zebrafish sequence updates (2006/6/20)
The files for mouse and zebrafish ENSEMBL protein sequences have been updated to
the most recent version of NCBI m36 (mouse) and Zv6 (zebrafish) available from
ENSEMBL build 39.
|
|
Service interuption (2006/6/15)
Maintenance work in one of our data centres will result in some servers being
unavailable during the hours of 16:00 to 20:00 CDT on June 15.
|
|
OSA1, release 4 now available (2006/6/13)
All of the GPM servers that provide access to the O. sativa have been
updated to the TIGR release 4.0 of the proteome. For more information on this
release, please check the TIGR rice
genome web site.
|
|
TAIR 6 now available (2006/6/9)
All of the GPM servers that provide access to the A. thaliana have been
updated to the TAIR version 6.0 of the proteome. This replaces the TIGR version
5.0, which has been available since the inception of the GPM. For more
information on this release, please check the
TAIR web site.
|
|
New versions of X! series search engines available
(2006/5/26)
The 2006.06.01 versions of the X! series protein identification search engines
are now available at our ftp site.
All three search engines (X! Tandem, X! P3 and X! Hunter) have been updated to
fix a problem with data obtained from a variant of mzXML spectrum files that do
not contain information about a spectrum's parent ion charge. In previous
versions, the search was performed correctly, but there were circumstances in
which some spectra would not be displayed properly using the GPM interface
software. Thanks to Paul Taylor for pointing out this problem.
|
|
X! Tandem used for novel gene detection (2006/5/15)
David States and his colleagues at the University of Michigan have developed a
method using X! Tandem to discover novel genes using proteomics data. They have
published their results in a study entitled "Novel gene and gene model
detection using a whole genome open reading frame analysis in proteomics"
in the open access journal Genome
Biology.
|
|
GPM used to understand plant resistance to insects
(2006/5/15)
Brett Phinney and collaborators recently used some of the unique features of the
GPM to discover a previously unknown mechanism by which plants defend
themselves against insect herbivores. The resulting paper
"Jasmonate-inducible plant enzymes degrade essential amino acids in the
herbivore midgut" was published as a featured article
PNAS.
|
|
Launch of new FTP site (2006/5/1)
In response to a number of suggestions made by users and contributors, we have
updated and rationalized our FTP site and software distribution system. The new
FTP site is organized into the following main folders:
data - contains mass spectra and collections of identifications;
fasta - contains the current versions of FASTA and .pro sequence files
used by the public version of the GPM;
projects - contains source code release distributions for GPM-related
projects;
proteotypic_peptide_profiles - contains FASTA files with lists of the
peptides normally observed in proteomics; and
repos - contains the current contents of the GPM Subversion Source code
repository.
We have also updated our Subversion source code repository to a new version and
a new server. If you already have the Subversion client installed, you will
have to "check out" the code again: simply updating the existing copy
will not work properly. Simply change directories into where you wish to
install the new repository and type the following line:
svn co http://source.thegpm.org/repos
This should create a new copy of the repository on your computer.
|
|
Source code repository maintenance (2006/4/29)
In an effort to improve service, we will be doing some maintenance work on the
GPM Subversion code repository, April 29 - May 1, 2006. The repository will be
unavailable during this period. The contents of the repository have been made
available on our ftp site, ftp://ftp.thegpm.org/repos.
|
|
X! Hunter now available (2006/4/18)
A version of the X! Hunter spectrum matching algorithm is now available, written
in the same style and using the same interface as X! Tandem. The source code
for Windows, Linux and OS X are available, as well as the annotated spectrum
libraries, from ftp://ftp.thegpm.org/projects/xhunter.
This version of X! Hunter compares experimentally observed spectra to annotated
libraries of averaged peptide spectra, obtained from GPMDB. Libraries are
available for human, brewer's yeast, mouse and thale cress.
If you would like to try this updated version, an experimental server has been
set up at h201.thegpm.org.
|
|
New version of X! Tandem available (2006/4/18)
A new version of X! Tandem (2006.04.01.2) is now completely tested and
available. Most of the changes are
associated with extending options available through the applications' user
interface. This version also brings together the code to create X! Tandem and
the proteotypic peptide profiling accelerated engine X! P3.
|
|
Denial of Service Attack (2006/2/27)
GPMDB experienced a malicious Denial of Service attack (DOS
explained) over the weekend, which made contacting the server
difficult. We are in the process of ensuring that it doesn't happen again, but
there may be some short periods of service interuption for the next day or two.
No damage was caused by the attack: it only affects the availability of a web
server for external requests.
|
|
New version of X! Tandem available (2006/2/13)
The latest version of X! Tandem (2006.2.01) is now available for download from
the GPM FTP site. The new
version is a maintenance release: the changes from the previous release are
minor and meant to improve performance and consistency, rather than to add new
features.
|
|
Final recommendations of the Paris Committee (2006/2/12)
Last year, a committee composed of members of the editorial boards of the major
proteomics journals met in Paris to discuss what types of information should be
required for the publication of proteomics results. The meeting and its goals
were described in a recent
JPR editorial. The final version of these recommendations is available
here. This report is part of an ongoing process of collaboration
between the journals, with the intent of keeping these recommendations
up-to-date as the techology and practice of proteomics evolves.
|
|
New GPMDB site
launch (2005/12/13)
Thanks to David Fenyö and the NIH National Research Resource Center at
Rockefeller University, we have been able to upgrade the capabilities of the
GPMDB server system. The new system features some improved navigation and
search pages as well as an improved system architecture to make adding
additional servers easier (NIH Research Resource grant RR00862).
|
|
Sequence updates (2005/12/13)
We have updated some of the proteome sequence files, to reflect new data from
our primary sequence sources. These updates are as follows:
-
ENSEMBL Bos taurus has been updated to the BTAU 2.0 version of the
genome (this is a significantly better translation than the previous BTAU 1.0);
-
ENSEMBL Gallus gallus has been updated to a better build of WASHUC1; and
-
SGD S. cerevisiae has been updated to the Dec. 2005 build, which has
changes to several genes.
|
|
New release of X! Tandem (2005/12/01)
A new maintenance release of X! Tandem (2005.12.01) is available from the
FTP site. This revision was made to maintain compatibility with the
evolving XML standards for representing mass spectra, as well as to add one new
protein cleavage type. This new version supports the "msRun" variant
of the mzXML, as well as three variants of mzData's specification for parent
ion charges.
-
An improved handling of hex encoded binary information in mzXML and mzData
files, for 64-bit processors, and an improved system for detecting XML file
types added by Steven Wiley (VLST Corp.).
-
Addition of testing for N-terminal glutamic acid cyclization, suggested by Oleg
Krohkin (Manitoba Centre for Proteomics and Systems Biology).
-
Addition of "semi" enzymatic cleavage (specific enzyme cleavage at
one end of a peptide and non-specific cleavage at the other), suggested by Matt
Monroe (PNNL).
-
Support for variant methods of expressing parent ion charge in mzData v. 1.05,
added by Fredrik Levander (University of Lund).
|
|
New tool from proteomecommons.org (2005/11/28)
The busy folks at the University of Michigan have created an interesting tool
that uses information gathered in GPMDB to improve the confidence of their
protein assignments. In their words:
A new tool has been added to the ProteomeCommons.org collection. This tool will
take a protein id and look up the peptides you'd expect to identify for that
protein using GPMDB, i.e. ask what have others found. You can then restrict the
list of known peptides by a given mass range. Optionally you can add in
peptides from the protein's tryptic digest or you can modify peptides with
known amino-acid modifications or you can add any arbitrary mass shift. When
you are all done the tool will create a plain-text file of the peptide's masses
for inclusion in a MSMS analysis.
You can retrieve the tools and get more information from the
project homepage at proteomecommons.org
|
|
Overall system updates (2005/11/3)
We've had a busy month, updating our servers and adding new features to the GPM.
As GPMDB gets closer and closer to the 10,000,000 peptides-assigned mark, we
have been trying to keep up with new information sources that have become
available. Two of the new services available for Homo sapiens proteins
are the Human Protein Atlas and
the Haplotype Mapping Project
.
The Human Protein Atlas contains annotated photomicrographs showing
immunologically stained tissue sections from a large set of healthy and
diseased human tissues. The goal of the project is to produce protein
expression information for all of the genes in the human genome. Currently,
they have a full set of data for approximately 1000 genes.
The International HapMap Project is a survey of the differences in haplotype for
a cross-section of the human population (click
here for their explanation of the project). It has amassed a large
amount of useful information about variations in the human genome.
We have also just added a new server for Mus musculus searches, similar
to those already in place for other species. It can be accessed at
mouse.thegpm.org. This computer is also the first 64-bit server in the
GPM system. We plan to have upgraded all of our search engine systems to 64-bit
processors by the end of February, 2006.
|
|
X! Tandem update available (2005/10/19)
Thanks to the tireless efforts of our testers , several problems with the
2005.10.01.3 release version of X! Tandem have been corrected. The chief
problem was that under some rare circumstances, incorrect assignments of
modified peptides could be made, if a particular peptide had a very large
number of residues that could be modified. We'd particularly like to thank to
Achim Treumann, at the Royal College of Surgeons in Ireland, who first noticed
this issue.
The release versions of the GPM and X! Tandem for all platforms have been
updated to the 2005.10.01.5 version of X! Tandem. Our apologies for any
inconvenience this may have caused. This problem did not affect P3, or any of
our other projects.
|
|
X! Tandem available on Biowulf (2005/10/6)
The Biowulf MPI cluster at the NIH has
added X! Tandem as an
application for NIH users. This large cluster (2400 Opteron, Xeon, and
XP/Athlon processors with an aggregate floating-point performance of 10 TFLOPS)
is used for bioinformatics calculations.
|
|
New releases of X! Tandem, the GPM and GPMDB available (2005/10/5)
New releases of the X! Tandem, the GPM and GPMDB are now available
from ftp.thegpm.org. These
new releases contain all of the new features and fixes that have been added
since the 2005/06/15 release, including:
- GO annotation diagrams;
- improved potential modification searching;
- PRIDE 2.0 XML compatibility;
- protein "intersection" searches; and
- multi-window species selection.
In addition, a new service pack
for existing GPM-USB devices
is available. Once the service pack is installed, it is now possible to configure these devices
as full web servers. A CD-installable version of the GPM is also available, for educational and
laboratory use.
|
|
GPMDB-US comes on-line (2005/09/25)
GPMDB, our proteomics data repository and
experiment validation database, has broadened its connectivity with the
addition of a sister site, GPMDB-US.
This site contains all of the information in GPMDB and it is located at
Rockefeller University, in the Mass Spectrometry and Gaseous Ion Chemistry Laboratory
headed by Brian Chait.
David Fenyö has taken on the task of setting up and maintaining the
servers. This site will receive daily updates of information gathered by GPM. We would
like to thank the National Institutes of Health National Centers for Research
Resources program for providing the funding that made this new site
possible.
|
|
New look for the GPM (2005/09/14)
We are in the final stages of putting together the October release
of the GPM. As a preview, the public GPM sites will be converted over to
the new interface style over the next few days. These changes include:
- Two taxon entry panes, one with eukaryote proteomes and the other
with prokaryotes. The normal eukaryote sites will have a selection
of prokaryotes, while the dedicated prokaryote site will
have all of the prokaryotes that NCBI provides. Remember that you can select as many entries as you like from
either pane.
- The ability to select which set of fragment ion series (a, b, c, x, y, or z, on the Advanced search page) you would like to
use for your search. Previously, this had been fixed to only b & y ions.
- You may select to use either monoisotopic or average fragment ion masses for a search (Advanced search page).
- Addition of Apis mellifera (domestic honey bee), Bos taurus (domestic cow) and Silurana tropicalis (African clawed frog)
to the normal eukaryote sites. Silurana tropicalis is a close relative of
Xenopus laevis, previously known as Xenopus tropicalis.
A more detailed description of the changes to X! Tandem that allow some of these
new features will be made available, once the code is ready for release.
|
|
GPMDB Maintenance (2005/09/13)
GPMDB will be taken off line for maintenance at 6:00 PM on Sept 13, 2005 and
brought back up by 9:00 AM Sept. 14, 2005. We are performing some maintenance
and testing necessary to bring a new mirror site at Rockefeller University
on line.
|
|
Peptide spectrum library searches (2005/09/10)
A new GPM application, X! Hunter,
has reached to point where it is
ready for public testing. X! Hunter is a different style of peptide
identification search engine. Rather than predicting spectra from
a peptide sequence, it directly compares an input spectrum to a library
of spectra that have been confidently assigned to a particular peptide
sequence. This type of pattern matching tool is ideal for applications such
as biomarker discovery, molecular scanners and instrument control, where
obtaining a confident match for a single spectrum quickly is important.
Using spectrum libraries is not at all new:
this type of pattern matching strategy has been
used in all forms of analytical spectroscopy (including mass spectrometry) since
the 1950's. The only reason it hasn't been applied to peptide mass spectra is
the obvious difficulty of obtaining exemplar spectra for all of the possible
peptides in a proteome.
Fortunately, we happen to have a database of nine million
examples, GPMDB. To create the libraries for X! Hunter, all of the confident assignments
for human and yeast peptides were extracted from GPMDB. Then spectra that were
replicate observations of the same peptide were averaged together and a final list of about
110,000 averaged peptide spectra was produced.
Please give X! Hunter a try (there are
several examples). Let us know what you think.
|
|
Experiments with Gene Ontology (2005/08/22)
Selected Gene Ontology (GO)
terms have be selected as a permanent part of the GPM display structure.
On the top of model listing pages for ENSEMBL human and SGD yeast sequences,
a new link to the "GO" page is now available. You can view histograms
or pie charts of your data, classified according to the ENSEMBL GO annotations.
For example:
- GPM10100001010, human sample, histogram
- GPM06600002542, yeast sample, pie chart
|
|
Communication/cross-posting with PRIDE (2005/08/22)
The European Bioinformatics Institute's entry into the proteomics
repository field, the PRoteomics IDEntification database
(PRIDE), has recently been upgraded. It is now possible to interchange data between
GPMDB and PRIDE, using their newly defined PRIDE 2.0 XML, which can be easily generated
from GPMDB's BIOML data files. We are beginning to transfer selected information
into PRIDE, which can be accessed through the PRIDE experiment number query interface.
The initial entries from GPMDB can be accessed by PRIDE_EXP:0000108 to
PRIDE_EXP:0001620.
|
|
New version of X! Tandem available (2005/08/16)
A new version of X! Tandem (v. 2005.08.15.3) has been released
that adds some new features and improves on some older ones. We would
like to thank the following contributors:
- Brendan Maclean (Fred Hutchinson Cancer Research Center) for improving
the internal consistency of high accuracy mass calculations;
- Patrick Lacasse (Laval University) for suggesting a mechanism to
force the selection of a given file format, even if it does not
meet the requirements for automatic detection;
- Rob Craig (Beavis Informatics) for completing the conversion of the
older, custom XML handlers into ExPat-compatible handlers; and
- Torsten Schwede and Michael Podvinec (Biozentrum, University of Basel) for
tracking down a memory access issue that resulted in stability problems when
X! Tandem was deployed across a PC Grid system.
|
|
Further Indexing by Google (2005/08/16)
In addition to the earlier indexing, Google has begun indexing individual
results in the GPMDB. Google queries such as:
- "gpmdb clathrin" (protein keyword);
- "gpmdb SNEEGSEEKGPEVR" (tryptic peptide sequence);
- "gpmdb GPM87400000110" (GPM ID number); or
- "gpmdb apolipoprotein haptoglobin" (multiple keywords)
all return results now. This facility should make it easy for users
to quickly enter into the GPMDB to find their own data, as well as
to cross-reference their results with those obtained by other researchers.
|
|
Bos taurus ENSEMBL genome available (2005/08/02)
ENSEMBL has recently added the annotation of
Btau 1.0 to its site. We have updated the B.
taurus GPM site to include this new information.
|
|
New Human Plasma Data Available (2005/08/02)
Dick Smith's group at Pacific Northwestern National Laboratories have kindly
made a large set of measurements on human plasma available to GPMDB. These
measurements are a strong supplement to the Human Plasma Proteome data
deposited by Gil Omen's HUPO team earlier this year.
The results can be accessed individually (they are numbered sequentially) from
GPM10100000612
-
GPM10100001201
|
|
GPM Disruption (2005/07/17)
After recovering gracefully from the power disruption last week, some parts of
the GPM were knocked out by a large thunderstorm in Winnipeg on Sunday morning.
Thanks to Shawn Walbridge of SynAck Hosting for his repairs to the system.
|
|
GPM Maintenance Service Disruption (2005/07/08)
Scheduled maintenance of the power system at one of the two main GPM data
centres will occur between 18:00 and 19:00 (CDT) on Sunday July 10, 2005. It is
possible that some service disruption will occur. We will try and get
everything back up and running smoothly as quickly as possible.
|
|
S. pombe and T. annulata added to GPM (2005/07/08)
The proteome of the fission yeast S. pombe has been added to the species
list for the eukaryote dedicated mirrors of
GPM. These sequences link through to GeneDB
as the primary source of sequence information. Also from GeneDB, the tick-borne
cattle parasite Theileria annulata, has been added to the
protista site.
|
|
Two new cluster versions of X! Tandem (2005/06/28)
We are very happy to announce the release of two new clustering interfaces for
X! Tandem, designed and implemented by Andy Link's group at Vanderbilt
University. These interfaces use the popular Message Passing Interface (MPI)
and the Parallel Virtual Machine (PVM) standards to tie together multiple
computers to allow a single X! Tandem job execute on multiple computers.
Initial documentation about the project can be found here
and the code found at our ftp site.
The details of the project have been accepted for publication in the Journal of
Proteome Research.
|
|
A new service pack release of GPM-USB (2005/06/28)
For those people who have purchased a
GPM-USB device from Beavis Informatics, a new service pack (2005.07.01)
has been released. To update your system, click
here and follow the instructions. The service pack includes a number of
updates, including:
-
integrated P3 support;
-
support for custom amino acid residue mass definitions;
-
numerous upgrades to display scripts; and
-
the most recent version of GPM Manager.
|
|
Bos taurus (domestic cow) now has its own site (2005/06/28)
Due to popular demand, a site dedicated to B. taurus
has been constructed. The bovine genome has not yet been entered into the
ENSEMBL system, so the proteome sequences are derived from the latest version
of the genome held at NCBI. When the ENSEMBL system is available, the site will
be updated to include the more informative genome links.
|
|
Aurum data added to GPMDB(2005/06/16)
The Aurum
data collection has been analyzed and imported into GPMDB. This data
set was produced from recombinant human proteins and can be used as a set of
high-quality examples of peptide spectra from the ABI 4700 TOF-TOF instrument.
The results, by plate number, are as follows:
T10467;
T10475;
T10622;
T10445;
T10707;
T10739; and
T10761.
|
|
A new release of X! Tandem and P3(2005/06/03)
The 2005.06.01.2 release of X! Tandem and P3 is now available. This
new release brings the code base for the two projects much closer together,
adding the ability to read MSDATA and MSXML files to P3. It also
corrects an issue pointed out by Phillip Wilmarth at OHSU, that could result in
some incorrect protein expectation values in very large MudPIT datasets with
large numbers of redundant identifications.
|
|
GPMDB has been googled (2005/06/02)
The popular web server indexing service Google
has indexed a large portion of the GPMDB data collection. Querying Google with
protein id number (such as an ENSEMBL id number) will now produce links in to
GPMDB results for that protein. Thanks to Google for providing this additional
indexing for us.
|
|
GPMDB peptide count jumps to over 6.5 million (2005/05/03)
As of today, the number of annotated peptides in GPMDB has reached 6,613,809.
Detailed statistics can be found here.
The addition of a statistics archive link enables users to browse previous
summaries and watch the GPMDB progress.
|
|
GPMDB adds HUPO PPP results (2005/4/14)
The GPMDB has added a special range of model accession numbers for the results
generated by the Human Proteome Organization Plasma Proteome Project. The first
set of 611 results, obtained by analyzing publically available data from the
PPP web site, has been made available. The results can be accessed by
GPM number, in the range
GPM10100000001 to
GPM10100000611. We would like to thank David States and Gil Omenn for
their cooperation and for allowing us to add this data to the GPMDB.
|
|
Xenopus sp. site added (2004.11.22)
In response to a request, we have added a new site
xenopus.thegpm.org with a set of sequence resources dedicated to the
genus Xenopus. It includes the most recent builds of UNIGENE for two Xenopus
species (laevis and tropicalis) as well as the nr sequences for the
subfamily Xenopodinae.
|
|
New features added to GPM (2004/11/12)
The public GPM interface has been updated to allow users to customize their
results and to use some of the data clustering ideas that we have developed.
The new features will be become available in a release of the open source
installation versions on Nov. 22. These changes include:
-
Addition of spectrum prefiltering to remove repeated spectra from the initial
set of mass spectra. This feature compares spectra using a dot product
calculation and removes spectra that have vector representations that point in
the same direction. The most intense spectrum out of a set of repeated spectra
is kept and used for analysis. This type of filtering can remove up to 90% of
spectra from a MudPit-style run, making data analysis and interpretation
easier.
-
The protein listing and display pages can be customized to limit the proteins
displayed to those with expectation values better than a value set by the user.
This feature can be used to simplify reports.
-
A pseudo-HPLC display has been added, which graphically illustrates the
intensity vs. retention time plot expected given the peptide sequences
discovered and the relative intensity of the MS/MS spectra. The retention times
are calculated using the algorithm described in Reference
9.
-
Dot product calculations have been added to the spectrum validation routine
used by GPMDB to show the best match to a given spectrum-to-sequence
assignment. This new routine orders the exemplar spectra drawn from GPMDB on
the basis of similarity to the spectrum that is to be validated. Previous
versions of this routine simply listed the best spectra (based on expectation
value).
-
A clustering feature has been added to the protein detailed display page, which
allows the user to hide repeated peptide sequences, if desired.
|
|
GPMDB peptide count breaks 2 million mark (2004/09/28)
As of today, the number of indexed peptides in GPMDB has reached 2,010,819.
Detailed statistics can be found here.
The addition of a statistics archive link enables users to browse previous
summaries and watch the GPMDB progress.
|
|
GPMO launches message board (2004/09/28)
Visit the message
board and post questions, comments, experiences or developments with
GPMO software. Feel free to help others in the community by sharing your
knowledge of GPMO applications. It's quick and easy to join so sign up today!
|
|
New members added to GPM Scientific Advisory Board (2004/09/03)
We are pleased to announce that Brian T. Chait (Rockefeller University), David
Fenyö (GE Healthcare) and Stephen B.H. Kent (University of Chicago) have been
named to the Scientific Advisory Board of the GPM.
|
|
Updates to GPMDB (2004/09/03)
GPMDB is the publically available index to
all of the data that has come in through the GPM's various interfaces. As of
today, it has 1.6x106 annotated MS/MS spectra, although that is
increasing all the time.
It now has some improved database browsing capabilities, such as a dedicated
keyword searching interface and a multiple accession number search interface.
It has evidence for more than 3,300 yeast ORFs and 10,700 ENSEMBL human protein
ids. We have a manuscript describing the technical features of the database, as
well as some use cases for answering questions with the system. If you'd like a
copy, please us, and we will
send you a manuscript preprint.
|
|
A new release of X! Tandem available (2004/09/01)
This new release of X! Tandem corrects a number of minor problems that have been
reported by users. It also adds new functionality:
-
The ability to use multiple "taxon" names in a single session. This
change allows the use on multiple species selection on the GPM sites. This
change is particularly important for users of the
plant and prokaryote sites,
where mixing and matching the sequences sources to be used can be quite handy.
-
Extension of the scoring model to improve scoring for parent ions with z >
2.
We'd like to thank Jimmy Eng and Mike Knierman for pointing out specific
problem spectra that helped a lot in improving the code.
This release is available on the ftp site,
but it is the first release that is also available through our new
code repository. We are now using the Subversion system for code
revision control, to co-ordinate our various code projects. A new release of
the GPM site installation is in preparation: it should be available on Sept.
10, 2004.
|
|
Major new release of X! Tandem available (2004/7/15)
X! TANDEM marks it first year by the release of X! TANDEM 2. Version 2 features
improved memory management, fast execution, and better use of multiprocessor
machines. It also has built-in reversed-sequence validation method features as
well as its own stochastic histogramming method validation. See the
release notes for more details. Version 2 has been deployed on all of
the GPM sites.
|
|
Important Note
When updating from previous versions of GPMO software, be sure to back up your
current files. This includes result files and any of the web interface or perl
script files that may have been customized for your particular installation. |
|
Updated versions of GPM and X! Tandem available (2004/6/1)
New versions of GPM and X! Tandem were made available on June 1, 2004. Thanks to
everyone who tested the new versions and suggested new views and features.
The new version of GPM includes a 1D/2D PAGE gel simulation view, an improved
tabular view for writing reports and a protein chip view.
The new version of X! Tandem includes the ability to specify PROSITE-style
motifs for potential modifications as well as the possibility of specifying
potential modifications as having prompt neutral losses (e.g., the loss of 98
from phosphoserine or phosphothreonine).
|
|
Two new Projects available: LiveCD and Quartz (2004/4/15)
The GPMO has added two new projects, LiveCD and Quartz to the site. LiveCD, a
project from the University of Michigan NCRR Center for Proteomics, provides a
simple method to install a Linux-based version of X! TANDEM and the GPM on a
large number of computers for instructional and demonstration purposes. It also
includes some software allowing the use of X! TANDEM on clusters of computers
running LiveCD.
Quartz is a GPMO staff project. It is a set of annotated spectrum collections,
meant to be used for bioinformatics research. The current collections contain
> 2000 MS/MS spectra, along with XML-formated annotation files.
|
|
X! TANDEM and the GPM release updates (2004/4/10)
New releases of both X! TANDEM and the GPM were released today. This is a
maintenance release, including fixes for small problems observed with previous
versions. The collections of sequences for the GPM have been updated to include
the latest sequence releases from ENSEMBL (1/4/2004).
|
|
Probity model published (2004/3/1)
The GPM takes advantage of the "Probity" statistical model to combine
the results of multiple peptide identifications into an expectation value for a
protein. This model, formulated by Jan Eriksson and David Fenyö, has just
been published in the Journal for Proteome Research (Abstract).
|
|
GPM and Tandem updated (2004/3/1)
The GPM Perl scripts and Tandem code have been updated. The new scripts allow
for more complete viewing of data supporting identifications, particularly the
histograms that are used to perform the statistical analysis for distinguishing
stochastic results from true ones. Tandem has been altered to correct a few
unexpected behaviors and to improve its support for N-and C-terminal
post-translational modifications.
|
|
GPM sequences updated (2004/2/16)
The sequences available to search have been updated to reflect the Feb. 9, 2004
release of most of the proteomes. The new sequences were downloaded from the
ENSEMBL site and tested on the public installations of the GPM. The new
databases are available for download from the GPM
ftp site, in the "gpm_current_version" folder.
|
|
New versions of Tandem and GPM released (2004/2/1)
As of February 1, the 2004.02.01 versions of both the GPM and Tandem have been
released. They include the updates necessary to use point mutation analysis in
local installations. The GPM has been updated to include a new data view mode:
"details". This new mode allows the user to examine the results at a
spectrum by spectrum level, viewing all of the raw data involved, including all
of the scoring histograms and spectrum peak lists.
|
|
Over 400,000 served! (2004/1/30)
At the end on January, the total number of spectra modelled using the public
version of the GPM reached 400,000.
|
|
The GPM identifies its 4000th gene (2004/1/27)
After only 27 days of operation, the GPM has discovered more than 4000
individual genes, using mass spectrum sets sent in by the proteomics community.
The GPM only imports information from genomic gene collections as necessary, so
this high rate of discovery has meant that the Machine's cached records are
improving at a rapid rate. We'd like to thank the proteomics community for
using the Machine, helping it learn about this large collection of observed
proteins.
|
|
Point mutation analysis with GPM (2004/1/18)
The GPM has been updated to include a new modeling feature in the Tandem engine.
It now allows modeling of all possible point mutations in a sequence during the
sequence refinement process. This new capability is still experimental: see the
Tandem project's explanation of this new
capability.
Modifications have been made to some of the other report pages, in an effort to
increase the amount of genomic and proteomic information made available when a
valid model sequence has been found.
|
|
Updating the GPM (2004/1/10)
After 10 days operation, the released version of the GPM has been updated to
include a set of patches to answer questions that cropped up. Thanks to the
many users who used the GPM and sent in helpful suggestions, as well as those
enthusiasts who actually installed their own local versions of the GPM.
|
|
Opening the Global Proteome Machine (2004/1/1)
As of January 1, 2004, the Global Proteome Machine has become active. It is a
simple, open source interface for analyzing tandem mass spectra against
eukaryote genomes. Using the GPM is free and available to anyone interested in
proteomics. The initial GPM configuration has the capacity to search
approximately 1010 MS/MS spectra per year.
|
|
A new release of X! TANDEM (2005/3/21)
This is the first release of X! TANDEM to fully support the mzXML and mzData
spectrum input formats. The design and initial implementation for both formats
was done by Patrick Lacasse (Université Laval, Dept. of Medicine,
supported by Genome Québec), with eXpat support and refinement of the code
by Brendan Maclean at Fred Hutchinson Cancer Research Center. We would also
like to thank Pedrioli Patrick from the Institute for Systems Biology, who
wrote the mzXML parser that Patrick used as a model for his implementation and
who has allowed us to make this available under the Artistic License. It should
be noted that our support for these standards, much like the standards
themselves, is preliminary and there may be some "flavours" of either
format that do not work as expected.
In addition, this release has a new optional parameter that allows the user to
specify a parameter file that contains masses for one or all of the amino acid
residues. This feature makes it possible to use non-standard amino acids, or
isotopically labelled amino acids. An example of using this feature to find
proteins that were made using all 15N amino acids is available at
the human
boutique site.
|
|
First release of P3 (2005/2/16)
The X! TANDEM P3 project is the first protein identification system
capable of using proteotypic peptides to accelerate searching and improve the
confidence of results. The system is built out of the X! TANDEM framework and
utilizes the GPM interface for its displays. The necessary proteotypic peptide
libraries are continuously updated from the GPMDB for human and yeast
proteomes. Proteotypic peptide libraries are much smaller than full proteomes,
so this type of searching runs quite a bit quicker than standard searches.
|
|
First release of the Jasper spectrum collection (2005/2/16)
The Jasper spectrum collection is a new type of bioinformatics resource, made
available as part of the Quartz spectrum library. Jasper collections contain
the best spectrum-to-peptide assignments from the GPMDB, broken down into
categories based on the reliability of the assignment (based on the measured
expectation value for a peptide). These libraries of spectra are in XML files,
containing the peptide sequences (with PTM's) associated with individual
spectra that were assigned to those peptides. The first library contains about
64,000 high quality spectrum-to-sequence assignements.
|
|
New release of X! Tandem available (2005/2/16)
The 2005.02.01 release of X! Tandem is now available. The new features of this
release are mainly for programmers, particularly an improved mechanism for
adding in new scoring systems, elegantly added by Brendan MacLean. Some changes
have also been made to take further advantage of high accuracy parent ion mass
measurements.
|
|
S. tropicalis genome sequence available(2005/2/10)
In addition to the other sequence resources available on the
Xenopus, the newly released protein predictions from the S. tropicalis
genome are now available. They are annotated using information from ENSEMBL.
This genome represents the first full sequence of an amphibian genome.
|
|
X! Tandem to use mzData and mzXML input standards (2005/1/21)
We are happy to announce that as the result of the most recent Standards in
Proteomics meeting held by the NIDDK in Washington earlier this month, X!
Tandem will support both MS/MS data representations, as proposed by HUPO-PSI
and the Institute for Systems Biology. The development work to incorporate the
two standards has begun and the finished software should be available by the
end of February. Many thanks to Patrick Lacasse (Université Laval, Dept.
of Medicine, supported by Genome Québec) for generating the mzXML and
mzData parser classes. We would also like to thank Randy Julian (Eli Lilly) for
his co-operation and help with adding mzData.
|
|
GPMDB begins collaboration with NIST (2005/1/21)
We are happy to announce that Dr. Steve Stein from the US National Institute of
Standards and Technology is now collaborating with us to produce a standardized
library of peptide MS/MS spectra to be used for the improvement of protein
identification algorithms. The donated entries in GPMDB will be statistically
evaluated and an "average" spectrum for a particular peptide, based
on its modifications and charge state, will be developed. Dr. Stein has worked
with the development of similar spectrum libraries for use with small molecule
identification for many years and we are very happy to be of assistance in
developing similar approaches for proteomics. Dr. Stein expects to announce the
preliminary results of his work at the US-HUPO meeting this spring.
|
|
GPM source code now mirrored on Proteome Commons (2005/1/18)
As a result of our collaboration with the Michigan Proteomics Consortium, we are
happy to announce the inclusion of all GPM software in the new
proteomecommons.org open source software archive. Many thanks to Jayson
Falkner, Pete Ulintz and Phil Andrews for creating this new site, which we hope
will be of general value to the proteomics community.
|
|
GPMDB peptide count breaks 4 million mark (2004/11/22)
As of today, the number of annotated peptides in GPMDB has reached 4,121,723.
Detailed statistics can be found here.
The addition of a statistics archive link enables users to browse previous
summaries and watch the GPMDB progress.
|
|