The X! search engine project

X! Search Engine Development

   X! Tandem FAQ
  1. Where are the results created?
  2. What does the taxonomy.xml file do?
  3. What does the input.xml file do?
  4. What does the default_input.xml file do?
  5. What is fasta_pro.exe used for?
  6. What types of data files can I use?
  7. Can tandem provide dual modifications like an ICAT experiment?
  8. I get the error: 'ld: can't locate file for: -lexpat' when compiling on OSX.
          See also: The GPM faq

1. Where are the results created?

The results are written to the bin folder. If you are running linux make sure that the permissions on this folder allow for new files to be created.


2. What does the taxonomy.xml file do?

This file is used as a taxon-to-file matching list. The URL paths in this file must point to the location of your local fasta files.

To add a new taxonomy, follow the existing format of this file. Each taxonomy can contain one or more URL as in the example below.

	<taxon label="yeast">
		<file format="peptide" URL="/pathtofasta/nr-Saccharomyces-cerevisiae.fasta" />
		<file format="peptide" URL="/pathtofasta/nr-Schizosaccharomyces-pombe.fasta" />
	</taxon>
							

To use this new taxonomy in your search, edit the input.xml file as follows.

	<note type="input" label="protein, taxon">yeast</note>
							


3. What does the input.xml file do?

Each one of the parameters for X! Tandem is entered as a labeled note node. Any of the entries in the default_input.xml file can be over-ridden by adding a corresponding entry to this file. This file represents a minimum input file, with only entries for the default settings, the output file and the input spectra file name.


4. What does the default_input.xml file do?

This file contains labeled note nodes which carry the input parameters to tandem. For more information, look at the notes and descriptions in this file.


5. What is fasta_pro.exe used for?

fasta_pro.exe is a supplementary program which enables users to create optimized files from fasta files. It is run from the command line as follows:


	\path\to\tandem\fasta>fasta_pro nr-Saccharomyces-cerevisiae.fasta
							

This will create a file called nr-Saccharomyces-cerevisiae.fasta.pro in the same directory. Make sure to add the .pro extension to the path(s) in taxonomy.xml if you convert the files this way.


6.What types of data files can I use?

X! Tandem is set up to use DTA, PKL or MGF files. These formats are ASCII files that are generated by a mass spectrometer's data handling system.

This is an example of a pkl file that contains the values from more than one spectrum.

415.4407 347.4898 3
52.8570 1.1043
57.8380 1.1043
64.9675 1.1043
70.0623 1.1043
....
....
....

401.7685 318.7188 3
49.9661 1.1043
55.6181 1.1043
73.7716 1.1043
76.2013 1.1043
98.4095 1.1043
....
....
....

The first line has 3 values, each separated by a space. The first value (415.4407 and 401.7685) is the parent ion mass/charge ratio. The next value (347.4898 and 318.7188) is the parent ion intensity. The last value (3 and 3) is the parent ion charge. Each line after, this until there is a blank line, contains 2 values. Again they are separated by a space. The value pairs are the daughter ion masses and daughter ion intensities.

This is an example of a dta file that contains the values from more than one spectrum.

929.278 2
104.997 2
114.036 2
133.052 2
151.593 2
....
....
....

1003.2 2
108.084 2
123.007 2
126.249 4
142.525 4
....
....
....

The first line has 2 values, separated by a space. The first value (929.278 and 1003.2) is the parent ion M+H. The next value (2 and 2) is the parent ion charge. Each line after, this until there is a blank line, contains 2 values. Again they are separated by a space. The value pairs are the daughter ion masses and daughter ion intensities.

This is an example of an MGF file that contains the values from more than one spectrum.

BEGIN IONS
PEPMASS=820.998855732003
CHARGE=1+
TITLE=Elution from: 0.14 to 0.14   period: 0   experiment: 2 cycles:  1
200.9942 2.3857
354.9856 2.3857
370.9314 5.1571
388.9714 9.6857
390.9608 2.7429
END IONS

BEGIN IONS
PEPMASS=691.910270874147
CHARGE=2+
TITLE=Elution from: 0.03 to 0.03   period: 0   experiment: 1 cycles:  1
264.8982 30.0286
264.9944 8.9429
435.8989 3.2857
442.9097 4.2571
478.9086 3.6571
END IONS

Each spectra is contained within a set of BEGIN IONS and END IONS tags. The value following PEPMASS= is the parent ion mass/charge ratio. The value following CHARGE= is the parent ion charge. The lines after the TITLE entry are pairs of daughter ion masses and intensities separated by a space.


7. Can tandem provide dual modifications like an ICAT experiment?

Tandem will allow two modifications on the same residue like in an ICAT experiment. The following is taken from the API description of the residue, modification mass parameter:

If a residue labelling strategy is being used where there are two types of reagents for modifying a residue (e.g., C), one with mass L1 and the other with mass L2, the following method can be used to find both types of labelled peptide in the same analysis.

  1. Add the value L1@C to the residue, modification mass parameter
  2. Add the value (L2-L1)@C to the residue, potential modification mass parameter

Because potential and complete modifications are treated separately internally by TANDEM, this will result in finding peptides modified with both types of parameters.


8. I get the error: 'ld: can't locate file for: -lexpat' when compiling on OSX

It looks like the linker isn't finding the expat libs. Try adding the path to the expat lib files on the LDFLAGS line of the Makefile:
LDFLAGS = -lpthread -L/usr/lib -L/path/to/libs -lm -lexpat

Copyright © 2004-2011, The Global Proteome Machine Organization