The X! search engine project

X! Search Engine Development

   X! tandem release notes
X! tandem will be released periodically, with the version numbering system formulated from the date of release. The changes made to the system on each release are detailed in the list below. Releases are listed with the most recent release on top.
Latest release:   ALANINE (2017.02.01)
This release has numerous small changes to reduce the amount of memory used by the application when utilizing its expert systems methods for finding PTMs and SAVs. It also adds several mechanisms to reduce the number of false positive assignments, particularly when testing for SAVs.
  1. The handling of expert systems information that is loaded from files but is not altered during a search has been changed so that a single, global data structure is used, rather than individual data structures in each thread. This feature required significant alteration to the code, so any projects that are "forks" of the main X! Tandem project should take some care to assure that their changes are not effected.
  2. A parent ion mass peak tolerance detection system has been added that can detect the minimum necessary tolerance as part of the report generation phase of the search.
  3. Protein SAVs that can be assigned to non-variant peptides are now excluded from the output.
VENGEANCE (2015.12.15)
This release improves the precision of handling variable modifications. Several new commands and notational add-ons make specifying how to test modification significantly more nuanced. The methods for handling variable modifications have been extensively re-written.
  1. The value of the command "protein, ptm complexity" (C, a floating point number 0.0–12.0) sets the maximum number of variable modification alternatives that will be tested for a particular peptide. The number of alternatives is 2.0C. If this number is not specified, the default value C = 6.0 will be used.
  2. The specification of a variable modification can include a value for the maximum number of modification sites to be considered in a single peptide. For example, the modification specification 15.994915@M would normally be used to test for M oxidation. If you wish only to consider one such modification per peptide, you can now write "15.994915@1M". Any number from 1–10 can be used in this notation. If not specified, a default value of 10 is used.
  3. It is possible to specify that a variable modification NOT occur at the C-terminus of a peptide. For example, previously "42.010565@K" would have been used to test for K acetylation. Using the new notation, "42.010565@]K" can be used, which will not test C-terminal lysines for acetylation (which are chemically impossible for tryptic peptides). This notation is useful for most lysine post-translational modifications, as well as dimethyl-arginine. Note: monomethyl-arginine and -lysine are both susceptible to trypsin cleavage, so this notation is not recommended for monomethyl variable modifications. It is also not recommended for use with carbamylation — a urea artifact that can occur during tryptic digestion — although reducing the number of carbamylations allowed per peptide, e.g., "43.005814@1K", can be quite useful.
  4. The legacy command "spectrum, use noise suppression" has been removed from the project: the original method was created for LCQ spectra and it no longer had any practical utility.
  5. Limits have been introduced to the length of peptide that will be considered to be a solution to a mass spectrum. Previous limits had only been based on the parent ion mass of a fragment ion spectrum. The new limits require a peptide to be 6–50 residues in length, regardless of the parent ion mass.
  6. The Windows version of the code has been updated and adapted for use with Microsoft Visual Studio Community 2015. It has been fully tested for Windows 8, 8.1 & 10.
  7. The Linux version of the code has been updated and adapted for use with Red Hat Enterprise Linux Workstation v.6.7, using gcc v. 4.4.7.
  8. This version was designed and tested to work with the BI GPM Fury version of the generic GPM interface.
   PILEDRIVER (2015.04.01)
This release adds a new output format (mzIdentML) and several variants of the mzML input format (MSNumPress compression). It also corrects an undesired behavior when searching for protein N-terminal and C-terminal modifications when using a protein modification specification XML file.
  1. New files (MSNumPress.cpp, MSNumPress.hpp) were added to the project (Johan Teleman) to implement the compression modes that have been added to the mzML specification.
  2. New files (mzid_report.cpp, mzid_report.hpp) were added to the project to implement the output of an mzIdentML file, in addition to the existing BIOML output. To generate an mzIdentML output, set the new parameter:
    • "output, mzid": if "yes" the file will be created with the extention .mzid.
  3. The "score_terminus_single" method has been removed from mprocess and replaced by an altered version of "score_terminus", which corrects the bad behavior associated with searching for protein N- and C-terminal modifications when using a protein modification specification file. It also improves the display progress reporting for this type of search.
   SLEDGEHAMMER (2013.09.01)
This release updates the E-value estimation algorithm and corrects several issues associated with using very high accuracy fragment ion mass tolerances.
  1. The E-value estimation algorithm has been totally rewritten to simplify the code.
  2. The new E-value algorithm deals more effectively with malformed protein sequence lists, particularly sequence lists that deliberately have very large numbers of protein sequences that have very similar sequences.
  3. The method for determining high accuracy fragment ion tolerances has been corrected.
   JACKHAMMER (2013.06.15)
This release contains a new method for specifying protein post-translational modifications specifically by protein coordinate and modification type.
  1. A file format similar to the amino acid polymorphism specification was developed and a reader implimented.
  2. This first version allowing coordinate-based PTM specification only allows one specified PTM per tryptic peptide simultaneously. For example, if two PTMs are specified for the same peptide, each will be tested separately, but not the two together. More than one PTM may be specified on a particular residue: each will be tested sequentially.
  3. A significant number of internal changes have been made to eliminate any variability in output caused by the use of multi-threading. The output files should now be line-by-line identical, independent of the number of threads used.
   CYCLONE (2013.2.01)
This release contains a new method of dealing with redundant protein sequences.
  1. A stacking system is used to track redundant protein sequences to eliminate multiple processing of identifical sequences. The redundant information is re-inserted into the results following processing, so that the resulting output is the same as would have been generated by older versions.
  2. The letter "X" in protein sequences is now interpreted the same as an asterisk "*", i.e., it is processed as a stop in translation.
  3. A new input parameter, spectrum, skyline path, was introduced to make the output easier to parse for the Skyline MRM utility suite.
   CYCLONE (2012.10.01)
This release contains several bug fixes as well as some new features associated with peptide cleavage patterns.
  1. The parameters " protein, cleavage semi" and "refine, cleavage semi" now have four possible values:
    • "amino" - simulates cleavage by an aminopeptidase
    • "carboxy" - simulates cleavage by a carboxypeptidase
    • "yes" - simulates cleavage by both amino- and carboxy-peptidase
    • "no" - semi-type cleavage not used (default)
  2. A bug that could produce negative values for "missed_cleavage" has been corrected.
  3. The RTINSECONDS parameter in MGF files is now handled correctly.
  4. A mechanism for specifying protein-specific modifications that are to be applied in all rounds of analysis has been added.
   CYCLONE (2011.12.01)
This is the second release in the CYCLONE project. There are numerous small fixes and changes from the first CYCLONE release, associated with reducing the memory requirements for large data sets. Specific new features are listed below.
  1. Additional testing for adventitious cleavage at Asp-Pro residues. These tests are made for all enzyme cleavage types, except [X]|[X] (cleavage at all residues). Testing Asp-Pro cleavage does not affect the "missed cleavage" count in an analysis.
  2. The load balancing method for starting multiple threads has been improved to take account of data sets that have been re-arranged from their original order in an LC/MS/MS file.
  3. Several changes have been made to keep up with changes to "standard" data file formats.
   CYCLONE (2010.12.01)
This is the first release in the CYCLONE project. There are numerous small fixes and changes from the last TORNADO release, mainly aimed at improving the speed of the application. Some of the new features are listed below.
  1. An improved scoring function for ETD data, incorporating the ideas described in Sun, R-X, et al. J. Proteome Res. 2010 (DOI: 10.1021/pr100648r).
  2. A more complete implementation of the mzML v. 1.1.0 file format (in collaboration with Fredrik Levander).
  3. A mechanism for reading the fragmentation type from mzXML files, when available. This mechanism allows X! Tandem to read mzXML files that contain mixtures of CID/HCD and ETD generated spectra and correctly apply the appropriate set of fragment ions to the individual spectra for interpretation (in collaboration with Peter Lobel).
  4. A change to the interpretation of the "refine, unanticipated cleavage" directive to being a "semi"-type cleavage rather than a full non-specific cleavage. The previous behavior can be obtained using the new "refine, full unanticipated cleavage" directive.
  5. An improved implementation of the "quick acetyl" checking mechanism brought out in the last TORNADO release.
  6. Explicit use of SIMD pragmas in the Windows version to speed up the native X! Tandem scoring function.
   TORNADO (2010.01.01)
This release adds several new features to X! Tandem, as well as compatibility with changes to some of the existing standard file formats. The new features are listed below.
  1. New parameter "quick acetyl" added to control a simplified check for acetylated protein N-terminii.
  2. New parameter "quick pyrolidone" added to control the previously existing peptide N-terminal cyclization check.
  3. New parameter "stP bias" added to control new behavior regarding the detection and assignment of phosphorylation sites.
  4. Compatibility with the current mzData format used by PRIDE.
   TORNADO (2009.04.01)
This release is a maintenance release that adds one new feature that is designed to be used in analyzing SILAC experiments. It also has minor changes to improve the detection of new input data file types.
  1. It is now possible to specify multiple sets of "complete" modifications to be applied sequentially. This was achieved by adding new commands to the input X! file format that look like the following:
      <note type="input" label="residue, modification mass">57@C</note>
      <note type="input" label="residue, modification mass 1">57@C,8@K,10@R</note>
    In this case, the data would be checked both for peptides with only cysteine modified by carboxyamidomethyl and for peptides with carboxyamidomethyl and SILAC labeled lysine and arginine residues. This applies to both the initial round of analysis as well as all refinement rounds. Any number of sets of complete modifications can be added, by incrementing the count in the label ("residue, modification mass 2", "residue, modification mass 3", etc.). Processing stops when either a count increment label is missing (e.g., there is a residue, modification mass 2 label but no residue, modification mass 3 label). Processing is also stopped with a zero length string is passed, for example the following string would stop processing at count = 1,
      <note type="input" label="residue, modification mass 1"></note>
    A non-zero length string that cannot be interpreted as a residue modification is interpretted as meaning that the data should be analyzed with no residue modifications, for example:
      <note type="input" label="residue, modification mass 1">none</note>, or
      <note type="input" label="residue, modification mass 1">     </note>.
  2. Compatibility for version 1.1 of the CMN format has been added, allowing long description strings (> 255 characters).
  3. Detection of new, non-standard variants of mzXML files has been added.
   TORNADO (2008.02.01)
This release is the first of the new TORNADO versions of X! Tandem, which have the goal of utilizing available external annotation information it improve the performance of sequence identifications. The 2007.04.01 release started this project, by adding single nucleotide induce amino acid polymorphism annotation to searches. TORNADO introduces the capability of setting the potential modifications tested on a sequence by sequence basis, controlled by a BIOML annotation file.
System level changes
  1. A fix for the method to force the use of specific file formats (made by Patrick Lacasse)
  2. Addition of a class to handle sequence annotation files in BIOML format (saxmodhandler).
  3. Addition of a method to load the annotation file information into an STL map, in the class mprocess.
  4. When compiling on Linux platforms, several possible makefiles are provided. The default makefile will work for GCC version 4, with the expat libraries dynamically linked. The other makefiles are all in the src directory, with names like "Makefile_XXX" where XXX is a descriptive name indicating in which situations this file is appropriate. To use these files, use a command line like this:
    	>make -f Makefile_GCCv3
This release adds compatibility with 64 bit floating point data in mzXML or mzData formats.
System level changes
  1. An new override of the dtohl method in the saxhandler.cpp file was added to deal with 64 bit floats.
This release is the first version to support amino acid residue polymorphism annotation. A file containing known coding mutations can be specified and the search engine will check each specified version of those modified residues.
System level changes
  1. Including SAPS required modification of the classes mprocess and mscore as well as the addition of a new classs mscoresap, which is specified in the mscorepam.h file. The new class follows the same pattern as the other state machines for tracking sequence modifications. A class that reads the XML-formatted SAPS annotation information has also been added, saxsaphandler, which follows the same pattern as the other BIOML processing classes.
  2. This version (and all subsequent ones) will use the preprocessor commands associated with the compiler make processor to specify the platform being compiled. Previous versions required the commenting out undesired options in the stdafx.h file. This change includes the PLUGGABLE_SCORING preprocessor definition in mscore.h that is necessary to alter the peptide scoring portion of the code.
  3. The Mac OSX version of the executable binaries is statically linked to the most recent version of the XML parser expat. Unlike previous versions it will not be necessary to have expat installed on the computer used to run the search engine. The Linux/Unix builds are now the only platforms onwhich dynamic linking is necessary.
This release is the first release to support rho-diagrams for the determination of expectation value thresholds. It also has a minor, but important change to the interpretation of results to make the results of refinement rounds more consistent. This release is also the first to support the Mac OS X 10.4 version for Intel processors. Support for non-Intel processors will be discontinued as of the next release.
System level changes
  1. The output file has a new output value in the "performance parameters" group, for example:
    <group label="performance parameters" type="parameters">
        <note label="quality values">117 34 22 10 5 2 1 0 0 1 0 1 0 0 0 0 0 0 0 0</note>
    This change was necessary to support the use of rho-diagrams in the GPM display software.
  2. The proteins reported as possible correct identifications have been changed somewhat. In previous versions, it was possible for a protein to be reported as identified even if it did not have any qualifying peptides that were found to have the specified enzymatic cleavage: a protein could have peptides found only during the cleavage-at-every site round or the point-mutation round. This behavior has been changed so that a protein must have at least one significant peptide found to have the specified enzymatic cleavage.
  3. The implementation of point mutation detection has been altered so that if a particular possible point mutation has been explained by any set of potential modifications, it will not be included as a possible solution in the output.
  4. The stdafx.h file has been altered to add in new preprocessor statements that deal with the different versions of Mac OS X. These new statements are #OSX_TIGER and #OSX_INTEL. To compile with OS X 10.4 on a PPC computer, uncomment both #OSX and #OSX_TIGER. To compile with OS X 10.4 on an Intel computer, uncomment only #OSX_INTEL.
This release of X! Tandem/P3 contains several minor changes to maintain cross-platform compatibility.
System level changes
  1. Code that uses iterator math to determine the limits of a calculation have been altered so that the iterators are not incremented passed the end of an STL container. Incrementing iterators passed the end of a container generates a run-time error when compiled with Microsoft Visual C++ 2005. The precompiler variable _CRT_SECURE_NO_DEPRECATE has been defined for the Visual C++ compiler, to prevent the generation of unnecessary compiler warnings for the use of C string functions, such as "strcpy".
  2. The SAXTandemInputHandler::characters method has been updated to improve its performance and to handle escape characters correctly (suggested by Brendan Maclean).
  3. The documentation and precompiler defines in stdafx.h that provide cross-platform compatibility for 64-bit integer types have been updated.
This release of X! Tandem contains two fixes to improve compatibility with Linux and one adjustment to be compatible with X! Hunter.
System level changes
  1. On at least some Linux platforms, astersiks (*) in FASTA file were not being processed properly. This has been corrected by Brendan Maclean.
  2. Some mzXML files could produce memory problems, when combined with some spectrum processing parameters because a vector was not being cleared between processing individual spectra. This problem did not affect the results of the search, but it could cause memory paging when using a large file.
  3. The taxonomy xml file has always contained type specification, which was not used by X! Tandem. Now X! Tandem enforces that FASTA or FASTA.PRO files must be specificed with type="peptide".

This release of X! Tandem includes a number of additions to the system API. These changes are mainly for programmers, allowing for greater customization of searches. Some of these features have been present in previous versions of X! Tandem, but have been either undocumented or unsufficiently well tested.

This version of the X! Tandem code also merges the code for X! P3. The P3 executable can be compiled by uncommenting the preprocessor variable X_P3 in "stdafx.h". Adding in the P3 code was done by creating several classes that are extensions of the normal Tandem classes and using a small number of preprocessor directives. These new classes have the prefix p3.

System level changes
Added parameters:
  1. output, log path
  2. output, message
  3. output, one sequence copy
  4. output, sequence path
  5. refine, modification mass
  6. refine, sequence path
  7. refine, tic percent
  8. scoring, cyclic permuation
  9. scoring, include reverse
  10. spectrum, sequence batch size

This release of X! Tandem includes a number of additions to the system API. These changes are mainly for programmers, allowing for greater customization of searches. Some of these features have been present in previous versions of X! Tandem, but have been either undocumented or unsufficiently well tested.

This version of the X! Tandem code also merges the code for X! P3. The P3 executable can be compiled by uncommenting the preprocessor variable X_P3 in "stdafx.h". Adding in the P3 code was done by creating several classes that are extensions of the normal Tandem classes and using a small number of preprocessor directives. These new classes have the prefix p3.

System level changes
Added parameters:
  1. output, log path
  2. output, message
  3. output, one sequence copy
  4. output, sequence path
  5. refine, modification mass
  6. refine, sequence path
  7. refine, tic percent
  8. scoring, cyclic permutation
  9. scoring, include reverse
  10. spectrum, sequence batch size
This release of X! Tandem contains several small fixes in response to error reports.
System level changes
  1. The C-ion mass calculation has been improved for electron-capture ion source identifications, suggested by David Fenyo.
  2. A problem relating to an include file that caused compilation difficulties for some versions of GCC on some version of Linux has been fixed.
  3. The calculation of parent ion mass difference has been improved, to provide better consistency for very accurate (< 1 ppm) parent ion mass determinations.
  4. The "semi" cleavage state machine has been adjusted for better performance.
This release of X! Tandem adds several new features, as well as improving the XML standards compatibility of the system.
System level changes
  1. An improved handling of hex encoded binary information in mzXML and mzData files, for 64-bit processors, added by Steven Wiley.
  2. Addition of testing for N-terminal glutamic acid cyclization, suggested by Oleg Krohkin.
  3. Addition of "semi" enzymatic cleavage (specific enzyme cleavage at one end of a peptide and non-specific cleavage at the other), suggested by Matt Monroe.
  4. An improved system for detecting XML file types, suggested by Steven Wiley.
  5. Support for variant methods of expressing parent ion charge in mzData v. 1.05, added by Fredrik Levander.
This release of X! Tandem adds several new features, as well as improving on some of the existing features. It contains a number of engineering architectural changes meant to allow simpler access to some of the key algorithms in the system.
System level changes
  1. An improved version of the state machine that lists all of the possible potential modification states of a peptide sequence was written by Brendan Maclean. This version is both more thorough and faster than the previous code.
  2. The capability of using chemical average masses for fragment ion mass calculations was added by Brendan Maclean.
  3. A simplification of the mprocess class that allows for a "pluggin" approach to adding new refinement modules was designed and implemented by Rob Craig.
  4. An improved routine for correcting for isotope peaks and multiple observations of similar masses was made by Patrick Lacasse.
  5. An additional state machine using cyclic peptide sequence permutation to compensate for small sequence collections and for large mass peptides was added. This feature is based on a suggestions made separately by Tom Blackwell & David States and Patrick Lacasse.
  6. An improved sorting method to improve the consistency of homologous sequence assignments was added by Rob Craig.
The changes in this release are aimed at increasing XML compliance and high accuracy mass calculation consistency.
System level changes
  1. The handlers for GAML spectra, taxonomy files and input parameter files have to changed to using expat, rather than custom routines.
  2. A more flexible mass calculation class has been added to improve molecular mass consistency for high accuracy calculations.
  3. The input spectrum file type detection method has been improved by adding the possibility of forcing it to select one file type. This forcing is done using the input parameter "spectrum, path type" parameters, which can have the values: dta, pkl, mgf, gaml, mzxml or mzdata. If this parameter is missing or of zero length, the normal file type detection scheme is used.
This version corrects an issue that could arise in large MudPIT data sets with large numbers of redundant identifications. The calculation of the protein expectation value in previous versions was susceptible to floating point overflows when making this calculation, resulting in unpredictable values.
This release adds the ability to process mzxml and mzdata file formats using eXpat library of functions. Most of the changes in this release were initially made by Patrick Lacasse (Université Laval, Dept. of Medicine, supported by Genome Québec) with the final version and optimizations made by Brendan MacLean, from the Fred Hutchinson Cancer Research Center. Also, the ability to define the amino acid residue masses has been added allowing users to change the default masses when doing N15 experiments for example.
System level changes
  1. New classes have been added to allow the processing of mzxml and mzdata file formats.
    Two of the new classes are publicly derived from loadspectrum, a custom class specific to Tandem. Two others are publicly derived from the xml parser class SAXSpectraHandler which is imported from the expat library of functions. The xml parser classes use the expat functions exclusively to parse the input in order to load it into the traditional Tandem spectra data members.
  2. base64.cpp and base64.h have been added to allow b64_decode_mio() function calls, which are needed to decode the spectra in mzxml and mzdata spectra files.
  3. Included in the src folder is the libexpat.lib which is required to compile new versions of the executable on Windows. Linux and OSX machines should have the required libraries as part of the core operating system.
  4. A new function has been added to msequtilites that allows amino acid residue masses to be defined by an xml input file. If the parameter 'protein, modified residue mass file' is defined in the input.xml, the masses are taken from the file defined by that parameter. An example of the format can be viewed here.
This release contains modifications necessary to insert new types of peptide scoring systems as well as to deal effectively with high accuracy parent ion measurements, which are now available in some types of mass spectrometers. Most of the changes in this release were made by Brendan MacLean, from the Fred Hutchinson Cancer Research Center.
System level changes
  1. Several new classes have been added, to make the scoring system "pluggable", i.e., it is now much easier to alter the scoring system used, for the purposes of bioinformatics investigations. These changes are mainly of interest to informatics professionals and they should not affect the normal operation of the software for users.
  2. The calculation of parent ion mass has been changed, taking more care as to the mass of added groups and correctly accounting for electron masses.
  3. Better statistical methods have been added to deal with the small number of possible peptides generated from a list of protein sequences that have a very high accurately determined parent ion mass.
This release adds in several features that were originally scheduled to appear in the 2004.11.15 release, but which were pushed back from the initial release. The 2004.11.15.2 version was not generally released.
System level changes
  1. Spectra that are interpreted as being caused by a prompt neutral loss now have the prompt loss specified in the appropriate <aa> node in the output.
  2. Correction of an issue with the OS X version that resulted in improper reading of ".pro" sequence files. Initially, the ".pro" format was to have both little endian and big endian versions, however this became too confusing to maintain. The current plan is to only use the little endian format and to compensate for this on-the-fly in the OS X version.
  3. The maximum parent ion charge to be used can now be specified using the "spectrum, maximum parent charge" parameter. This parameter has a default value of 4. This change was made necessary because of high charge states being called by some MS peak assignment software, which caused spurious assignments.
  4. The first round of refinement (finding partially cleaved peptides) has been extended, so that it possible to repeat it with different sets of modifications and motifs. These additional refinement rounds are specified by adding parameters using the following format:
    • Round 1: "refine, potential modification mass"
      "refine, potential modification motif"
    • Round 2: "refine, potential modification mass 1"
      "refine, potential modification motif 1"
    • Round 3: "refine, potential modification mass 2"
      "refine, potential modification motif 2"
    This will continue until both of the next pair of parameters are either missing or neither contain an ampersand (@).
This is a maintenance release, to correct one issue identified in the 2004.09.01 release.
System level changes
  1. An error that resulted in the incorrect interpretation of some motifs was corrected.
  2. Addition of spectrum prefiltering to remove repeated spectra from the initial set of mass spectra. This feature compares spectra using a dot product calculation and removes spectra that have vector representations that point in the same direction. The most intense spectrum out of a set of repeated spectra is kept and used for analysis. This type of filtering can remove up to 90% of spectra from a MudPit-style run, making data analysis and interpretation easier.
This is a maintenance release, to correct one issue identified in the 2004.08.01 release.
System level changes
  1. A possible floating-point overrun error that could lead to 0.0 expectation values for high scoring peptides was detected from GPMDB submissions has been corrected.
This is a maintenance release, to correct several issues identified in the 2004.07.15 release.
System level changes
  1. An error that reduced the score for triply-charged ions was corrected.
  2. Quantitation information was added to the output XML file.
This is a major release of TANDEM, sufficiently different from previous releases to merit a major revision number: this release will be referred to as TANDEM 2.
System level changes
  1. The memory management throughout the program has been analyzed and altered to minimize the amount of memory used per spectrum. This effort has reduced the amount of memory used in single threaded operation by as much as 60%: the improvement for double threaded operation may be as much as 80%.
  2. The threading model has been changed to allow for the use of multiple processeors in the refinement process. TANDEM 1 separated work between the threads by dividing up the sequences to search, so that each thread would only search a subset of the sequences in a FASTA file. TANDEM 2 divides up the mass spectra between threads, so that each thread searches a subset of the mass spectra. This change makes it easier to divide up the refinement job, but means that running more than one thread on a single processor will degrade the performance of the software. For best performance, it is now important to keep the number of threads and the number of processors the same.
  3. The refinement process has been improved in accuracy by applying a logical filter after each step of refinement. This means that once a refinement step is completed, the new results obtained from the refinement are examined and it the new results are not significantly better than those obtained from a simpler search, they are discarded and the simpler results retained. This filtering significantly reduces the complexity of analyzing results when there may be a variety of similar modification patterns or point mutations that explain a particular spectrum.
  4. Validation of results using reversed sequence databases has been built-in to the search process. This validation may be turned on or off, using the new input parameter "scoring, include reverse" (values = yes|no). This validation process tabulates the number of unique high probability hits from the reversed sequence search and places them in the output file, along with estimates of the false positive rate based on TANDEM's stochastic histogramming technique and the estimate derived from the reversed sequence process. NOTE: When this validation method is used, twice as many sequences must be processed (both forward and reversed), which may require significantly more processing time.
  5. Numerous small optimizations have been made, particularly for loading and reporting the results for very large collections of mass spectra.
This release adds three new functionalities to X! TANDEM. These new functions make it possible to modify protein sequences in new ways.
System level changes
  1. The ability to specify modifications based on sequence motifs was added to both the normal search and refinement steps. A comma separated list of motifs in slightly modified PROSITE format can be used to only modify specific residues. An example of this format is:
    204@[N!]{P}[ST]{P} - which says a motif that has an N, followed by any residue except P, followed by an S or a T, followed by any residue except P is specified. Modify the residue in the group containing the exclamation point (in this case the N) by adding 204 Da.
    The peptides containing this motif are checked both with and without this modification, so it is interpreted the same way that a "potential" modification is interpretted. The rules for creating these motifs are:
    • Square brackets "[]" indicate any of the residues contained is possible;
    • French brackets "{}" indicate that any of the residues contained is forbidden;
    • A bare letter is interpreted as if it was in square brackets and can be modified with an exclamation point, e.g. 16@[M!] is the same as 16M!;
    • An exclamation point indicates the position of modification;
    • The letter "X" indicates any residue;
    • and
    • Round brackets "()" indicate a count, e.g. "X(10)" means ten X's in a row;
    • All other characters are ignored, e.g. 80@[ST!]PX[KR] is the same as 80@[ST!]-P-X-[KR].
  2. The ability to specify prompt neutral losses for potential modifications (including motifs) has been added. This neutral loss is specified by adding a colon followed by the mass corresponding to the loss. For example:
    80@S specified phosporylation without loss, while 80:-98@S specifies the neutral loss of the phosphate group.
  3. The ability to specify on a sequence by sequence level specific fixed modifications of residues by the residue number has been added. This capability cannot be exploited currently because of a lack of sequence lists that contain this type of information. However, an appropriately translated version of a database such as SWISSPROT could be used to provide this information.
This release is a maintenance release. It should improve memory usage for very long sequence lists, but other wise should be neutral.
System level changes
  1. The mechanism for storing sequences in the mprocess class has been changed. Previously, a copy of each protein sequence was stored with each peptide model associated with a spectrum. Now, a master list of protein sequences is kept and only a lookup number is stored with each peptide model. This change improves memory management for very large pools of redundant proteins or very long lists of spectra.
This release corrects an interpretation problem introduced in the 2004.04.01 release. This problem results in an overemphasis on peptides found in the refinement steps.
System level changes
  1. The refinement processing was returned to its previous state, so that only the best scoring peptides from the refinement process are reported.
This release is the result of an effort to reduce memory usage by TANDEM. This effort has resulted in a 70% reduction in memory usage, when using large data files.
System level changes
  1. GAML spectra, such as those in output xml files, can now be used as input data.
  2. The length of scoring histogram arrays have been altered to improve memory usage.
  3. Several instances of temporary copies of data have been removed and other data structures cleared as soon as possible after use.
Corrected problems
  1. A behavior that resulted in the lost of the last character in sequences in some FASTA files has been corrected.
  2. A compatibility issue resulting from various choices for the size of the size_t STL variable on unix platforms has been corrected, so that most unix platforms should compile without modifying the linux version of the code.
Known problems
  1. No problems known at time of release

This release fixes a number of compatibility issues and unexpected behaviors in Tandem and associated formating files.
System level changes
  1. A new state machine was added to perform N- and C- terminal partial modifications. Previous versions used these modifications as complete modifications only.
  2. The optimization for the minimum number of residues considered was removed and replaced with the constant value of 4. The prior optimization did not produce a significant improvement in speed, but it did cause occasional problems with large neutral losses.
  3. The xslt and css files have been updated to conform more closely to specification, making them compatible with the FireFox browser.
Corrected problems
  1. A behavior that allowed the occasional consideration of peptides with too many missed cleavage sites was fixed.
  2. A compatibility issue for starting threads on some unix platforms has been corrected, so that most unix platforms should compile without modifying the linux version of the code.
Known problems
  1. No problems known at time of release

This release introduces the capability of detecting point mutations in protein sequences
System level changes
  1. A new state machine was added to the mscore object to track point mutations.
  2. A new report value was added so that a protein sequence would only recorded once in the XML output, if desired.
Corrected problems
  1. Several problems associated with detected unsupported spectrum files types were corrected. Previous version could hang indefinately if binary files were used.
  2. A LINUX compilation problem with some flavours of LINUX that caused a failure to create new threads was corrected.
Known problems
  1. No problems known at time of release

This release introduces a new statistical model for multiple model correlations.
System level changes
  1. A new statistical interpretation was added, to combine expectation values when multiple models from the same sequence are found to be the best model in different spectra. Using this model, expectation values for the collections of models are now listed as the base-10 log of the expectation value, beside the FASTA description line.
  2. The way FASTA description lines are listed has been changed. Rather than listing the descriptions in the same order they were encountered in the search, they are now listed by length: the longest entry first. The logic to this choice is that for the NCBI database nr, the oldest entry for a similar sequence tends to be the longest and the first line of that entry tends to have the best description of the protein's common name. Unfortunately, this is not always true.
  3. A new way of organizing the output was added. It can be accessed by setting the output, sort results by parameter value to protein. Models corresponding to a given sequence are grouped together, with the best set of models at the top of the page.
Corrected problems
  1. FASTA file name problem fixed.
  2. Multiple modification reporting problem fixed.
Known problems
  1. No problems known at time of release

This was the first release of a multithreading version of tandem.
System level changes
  1. A threading model was introduced that allows up to 16 threads. Each thread is given a unique set of sequences to model and the results of all of the models are summed at the end. The threads are started in a simple manner and then the program waits for all threads to return before summing and correlating the data.
  2. A new class called "mspectrumcondition" was added to perform any spectrum filtering necessary. The initial release had this functionality in the "mscore" class.
  3. An example XSLT and CSS stylesheet pair were added, that allow viewing the output XML with a browser somewhat easier. They also are an example of how such a pair of files can be constructed to create a GUI from the output XML.
Corrected problems
  1. The memory leak found in the 2003.05.01 release was found and fixed.
Known problems
  1. Some FASTA sequence list file names are not being recorded in the output XML.
  2. Occasionally, multiple potential modifications are noted for the same residue.
  3. The XSLT is not compatible with Internet Explorer 5.5. This is by design and it will not change in later releases.

This was the first release of tandem.
Known problems
  1. Soon after the release, a serious memory leak was reported, which became evident when searching large data files. This leak reduced system performance dramatically.

Copyright © 2004-2013, The Global Proteome Machine Organization