The X! search engine project

X! Search Engine Development

  X! TANDEM Spectrum Modeler
Note: For versions of X! Tandem that have been modified to run on cluster computer systems, please see Parallel Tandem or X!!Tandem.

X! Tandem open source is software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.

This software has a very simple, sophisticated application programming interface (API): it simply takes an XML file of instructions on its command line, and outputs the results into an XML file, which has been specified in the input XML file. The output format is described here (PDF). This format is used for all of the X! series search engines, as well as the GPM and GPMDB.

Unlike some earlier generation search engines, all of the X! Series search engines calculate statistical confidence (expectation values) for all of the individual spectrum-to-sequence assignments. They also reassemble all of the peptide assignments in a data set onto the known protein sequences and assign the statistical confidence that this assembly and alignment is non-random. The formula for which can be found here. Therefore, separate assembly and statistical analysis software, e.g. PeptideProphet and ProteinProphet, do not need to be used.

Latest release:   ALANINE (2017.02.01)
This release has numerous small changes to reduce the amount of memory used by the application when utilizing its expert systems methods for finding PTMs and SAVs. It also adds several mechanisms to reduce the number of false positive assignments, particularly when testing for SAVs.
  1. The handling of expert systems information that is loaded from files but is not altered during a search has been changed so that a single, global data structure is used, rather than individual data structures in each thread. This feature required significant alteration to the code, so any projects that are "forks" of the main X! Tandem project should take some care to assure that their changes are not effected.
  2. A parent ion mass peak tolerance detection system has been added that can detect the minimum necessary tolerance as part of the report generation phase of the search.
  3. Protein SAVs that can be assigned to non-variant peptides are now excluded from the output.
VENGEANCE (2015.12.15)
This release improves the precision of handling variable modifications. Several new commands and notational add-ons make specifying how to test modification significantly more nuanced. The methods for handling variable modifications have been extensively re-written.
  1. The value of the command "protein, ptm complexity" (C, a floating point number 0.0–12.0) sets the maximum number of variable modification alternatives that will be tested for a particular peptide. The number of alternatives is 2.0C. If this number is not specified, the default value C = 6.0 will be used.
  2. The specification of a variable modification can include a value for the maximum number of modification sites to be considered in a single peptide. For example, the modification specification 15.994915@M would normally be used to test for M oxidation. If you wish only to consider one such modification per peptide, you can now write "15.994915@1M". Any number from 1–10 can be used in this notation. If not specified, a default value of 10 is used.
  3. It is possible to specify that a variable modification NOT occur at the C-terminus of a peptide. For example, previously "42.010565@K" would have been used to test for K acetylation. Using the new notation, "42.010565@]K" can be used, which will not test C-terminal lysines for acetylation (which are chemically impossible for tryptic peptides). This notation is useful for most lysine post-translational modifications, as well as dimethyl-arginine. Note: monomethyl-arginine and -lysine are both susceptible to trypsin cleavage, so this notation is not recommended for monomethyl variable modifications. It is also not recommended for use with carbamylation — a urea artifact that can occur during tryptic digestion — although reducing the number of carbamylations allowed per peptide, e.g., "43.005814@1K", can be quite useful.
  4. The legacy command "spectrum, use noise suppression" has been removed from the project: the original method was created for LCQ spectra and it no longer had any practical utility.
  5. Limits have been introduced to the length of peptide that will be considered to be a solution to a mass spectrum. Previous limits had only been based on the parent ion mass of a fragment ion spectrum. The new limits require a peptide to be 6–50 residues in length, regardless of the parent ion mass.
  6. The Windows version of the code has been updated and adapted for use with Microsoft Visual Studio Community 2015. It has been fully tested for Windows 8, 8.1 & 10.
  7. The Linux version of the code has been updated and adapted for use with Red Hat Enterprise Linux Workstation v.6.7, using gcc v. 4.4.7.
  8. This version was designed and tested to work with the BI GPM Fury version of the generic GPM interface.
PILEDRIVER (2015.04.01)
This release adds a new output format (mzIdentML) and several variants of the mzML input format (MSNumPress compression). It also corrects an undesired behavior when searching for protein N-terminal and C-terminal modifications when using a protein modification specification XML file.
  1. New files (MSNumPress.cpp, MSNumPress.hpp) were added to the project (Johan Teleman) to implement the compression modes that have been added to the mzML specification.
  2. New files (mzid_report.cpp, mzid_report.hpp) were added to the project to implement the output of an mzIdentML file, in addition to the existing BIOML output. To generate an mzIdentML output, set the new parameter:
    • "output, mzid": if "yes" the file will be created with the extention .mzid.
  3. The "score_terminus_single" method has been removed from mprocess and replaced by an altered version of "score_terminus", which corrects the bad behavior associated with searching for protein N- and C-terminal modifications when using a protein modification specification file. It also improves the display progress reporting for this type of search.
Copyright © 2004-2013, The Global Proteome Machine Organization
Use of all documentation for X! Tandem, X! P3 and X! Hunter is governed by the Artistic License.