X! Series search engines   The Global Proteome Machine Organization
  www.thegpm.org

  TANDEM project

  Point mutation modeling with X! Tandem

Starting with the 2004.02.01 release of Tandem, the capability of modeling point mutations has been added. This important source of sequence variability has been difficult to address with previous methods of analyzing MS/MS results, however it is now simple and surprisingly fast.

  What are point mutations?

Point mutations are single amino acid changes in a protein sequence. They are produced by the modification of the nucleotide sequence in a gene, which may be the result of a somatic or a germ line change. Somatic point mutations probably occur quite frequently in eukaryotes, but they are not passed on to subsequent generations - with the very notable exception of plants that are propagated by grafting. Point mutations in germ cells result in modifications in subsequent generations, and it is this type of mutation that leads to ongoing genetic variability in a population.

For example, if a peptide has the sequence YGGFLR, then one possible point mutation is AGGFLR, where the mutation is residue 1 changing from Y to A. There are 19 possible point mutations for each residue in a peptide, considering only the 20 commonly occurring amino acids.

It was realized early on in protein sequence comparison studies that there is a bias towards certain point mutations, when viewed in an evolutionary sense. The matrix below illustrates this type of evolutionary bias.

ARNDCQEGH ILKMFPST WYV
A2
R-26
N002
D0-124
C-2-4-4-54
Q0112-54
E0-113-524
G1-301-3-105
H-1221-331-26
I-1-2-2-2-2-2-2-3-25
L-2-3-3-4-6-2-3-4-226
K-1310-510-20-2-35
M-10-2-3-5-1-2-3-22406
F-4-4-4-6-4-5-5-5-212-509
P10-1-1-30-1-10-2-3-1-2-56
S10100-101-1-1-30-2-313
T1-100-2-100-10-20-1-2013
W-62-4-7-8-5-7-7-3-5-2-3-40-6-2-517
Y-3-4-2-40-4-4-50-1-1-4-27-5-3-3010
V0-2-2-2-2-2-2-1-242-22-1-1-10-6-24

W. A Pearson, Rapid and Sensitive Sequence Comparison with FASTP and FASTA, in Methods in Enzymology, ed. R. Doolittle (ISBN 0-12-182084-X, Academic Press, San Diego) 183(1990)63-98.

The matrix is frequently used to score aligned peptide sequences to determine the similarity of those sequences. The numbers given above were derived from comparing aligned sequences of proteins with known homology and determining the "accepted point mutations" (PAM) observed. The frequencies of these mutations are in this table as a "log odds-matrix" where:

Mij = 10(log10Rij),
where Mij is the matrix element and Rij is the probability of that substitution as observed in the database, divided by the normalized frequency of occurrence for amino acid i. All of the number are rounded to the nearest integer. The base-10 log is used so that the numbers can be added to determine the score of a compared set of sequences, rather than multiplied.
  How does Tandem model point mutations?

Tandem uses the concept of refinement to make checking for point mutations practical. Refinement is simply the idea of identifying a short list of potentially interesting model protein sequences and then exhaustively analyzing those sequences for the best matches to a list of MS/MS spectra. Once the list of model protein sequences has been identified, it first analyzed for:

  1. multiple missed protease cleavage sites;
  2. multiple potential chemical and post-translational modifications;
  3. unanticipated peptide bond cleavages;
  4. modified N-terminii;
  5. modified C-terminii; and finally
  6. point mutations

The point mutation analysis is done using the following steps:

  1. list all possible proteolytic peptides;
  2. select a peptide
  3. systematically change each residue in the peptide for all other possible amino acids and score the mutated peptide (and all potential modifications) against all of the available MS/MS spectra
  4. record the best models, if they are better than previous models
  5. repeat 2-4 until the list of possible proteolytic peptides has been completely analyzed.

It should be noted that only one point mutation is considered per proteolytic peptide. The hypothesis that there is most likely only one point mutation per peptide is a biologically reasonable one for analyzing peptides against a host proteome: it is too limited if the analysis is to be against a taxonomically distant proteome. It will, however, catch almost all somatic and germ line point mutations.

  Mutation Mass Shifts
  1. Residues DOWN the left indicate the EXPECTED residues.
  2. Residues ACROSS the top indicate the MUTANT residues.
G      A       S     P    V    T     C    I/L  N    D    K/Q  E    M    H    F    R     Y    W    
5771879799 101103113114115128129131 137147156163186
G57...1430404244 465657587172748090 99106129
A71-14...162628 303242434457586066 768592115
S87-30-16...10 121416262728414244 5060697699
P97-40-26-10...2 46161718313234 4050596689
V99-42-28-12 -2...241415162930 323848576487
T101-44-30-14 -4-2...21213142728 303646556285
C103-46-32-16 -6-4-2...1011122526 283444536083
L/I113-56-42-26 -16-14-12-10...121516 182434435073
N114-57-43-27 -17-15-13-11-1...11415 172333424972
D115-58-44-28 -18-16-14-12-2-1...1314 162232414871
Q/K128-71-57-41 -31-29-27-25-15-14-13...1 3919283558
E129-72-58-42 -32-30-28-26-16-15-14-1 ...2818273457
M131-74-60-44 -34-32-30-28-18-17-16-3 -2...616253255
H137-80-66-50 -40-38-36-34-24-23-22-9 -8-6...10192649
F147-90-76-60 -50-48-46-44-34-33-32-19 -18-16-10...91639
R156-99-85-69 -59-57-55-53-43-42-41-28 -27-25-19-9...730
Y163-106-92 -76-66-64-62-60-50-49-48 -35-34-32-26-16-7...23
W186-129-115-99 -89-87-85-83-73-72-71-58 -57-55-49-39-30-23...
Copyright © 2004-2011, The Global Proteome Machine Organization