The X! search engine project

X! Search Engine Development

  X! HUNTER ASL file format (2006.09.15)

The X! Hunter Annotated Spectrum Library (ASL) system uses a binary format to store the spectra and annotations. This format was designed to make loading the data from the libraries as fast as possible. The structure of this binary format and all of the required data fields are specified below. All storage is in little-endian format.

The initial release of this file format used only the first 4 bytes of the file for "header" information. In this release, the first 256 bytes are reserved for header information. The format of this header is as follows:

  1. 4-byte int: all 4 bytes = 0x00;
  2. 4-byte unsigned int: number of spectra in file; and
  3. 248-bytes char array: unassigned (may be 0x00).

The annotation and spectra are stored sequentially, as in the previous format. The median value of the spectrum set used to construct any library entry is now included in the file, using the following format:

  1. 8-byte double: parent ion M+H (Daltons);
  2. 4-byte int: parent ion charge;
  3. 4-byte float: sum of the squares of the fragment ion intensities;
  4. 4-byte float: median expectation value of spectra;
  5. 4-byte int: length of the peptide sequence, L;
  6. L-byte char array: peptide sequence;
  7. 4-byte int: number of spectrum intensity-m/z pairs, P;
  8. P-byte unsigned char array: spectrum intensities;
  9. P*4-byte float array: spectrum m/z values;
  10. 4-byte int: number of sequence modifications, M;
  11. M modification objects, each containing:
    • 4-byte int: modification sequence position;
    • 8-byte double: modification mass.
  12. 4-byte int: number of protein sequences containing the peptide, N;
  13. N protein objects, each containing:
    • 4-byte int: length of protein sequence accession string, S;
    • S-byte char array: protein sequence accession string;
    • 4-byte int: position of peptide in protein sequence;
  14. Repeat until all T spectra loaded.

NOTES:

  1. The spectra are not stored in any particular order: spectra associated with the same protein may be located anywhere within the file.
  2. Annotations are based on sequence accession numbers for particular sequence collections, e.g., ENSEMBL, IPI or SWISS-PROT protein accession numbers.
  3. X! Hunter ASLs store the twenty (20) most intense peaks for a particular MS/MS spectrum.
  4. Parent ion masses are calculated based on the mono-isotopic masses of the peptide residues.
Copyright © 2004-2011, The Global Proteome Machine Organization