The X! Tandem point mutations page

  The Global Proteome Machine Organization
  www.thegpm.org
  TANDEM project

Point mutation modeling with X! Tandem

Starting with the 2004.02.01 release of Tandem, the capability of modeling point mutations has been added. This important source of sequence variability has been difficult to address with previous methods of analyzing MS/MS results, however it is now simple and surprisingly fast.

What are point mutations?

Point mutations are single amino acid changes in a protein sequence. They are produced by the modification of the nucleotide sequence in a gene, which may be the result of a somatic or a germ line change. Somatic point mutations probably occur quite frequently in eukaryotes, but they are not passed on to subsequent generations - with the very notable exception of plants that are propagated by grafting. Point mutations in germ cells result in modifications in subsequent generations, and it is this type of mutation that leads to ongoing genetic variability in a population.

For example, if a peptide has the sequence YGGFLR, then one possible point mutation is AGGFLR, where the mutation is residue 1 changing from Y to A. There are 19 possible point mutations for each residue in a peptide, considering only the 20 commonly occurring amino acids.

It was realized early on in protein sequence comparison studies that there is a bias towards certain point mutations, when viewed in an evolutionary sense. The matrix below illustrates this type of evolutionary bias.

	A	R	N	D	C	Q	E	G	H	I	L	K	M	F	P	S	T	W	Y	V
A	2
R	-2	6
N	0	0	2
D	0	-1	2	4
C	-2	-4	-4	-5	4
Q	0	1	1	2	-5	4
E	0	-1	1	3	-5	2	4
G	1	-3	0	1	-3	-1	0	5
H	-1	2	2	1	-3	3	1	-2	6
I	-1	-2	-2	-2	-2	-2	-2	-3	-2	5
L	-2	-3	-3	-4	-6	-2	-3	-4	-2	2	6
K	-1	3	1	0	-5	1	0	-2	0	-2	-3	5
M	-1	0	-2	-3	-5	-1	-2	-3	-2	2	4	0	6
F	-4	-4	-4	-6	-4	-5	-5	-5	-2	1	2	-5	0	9
P	1	0	-1	-1	-3	0	-1	-1	0	-2	-3	-1	-2	-5	6
S	1	0	1	0	0	-1	0	1	-1	-1	-3	0	-2	-3	1	3
T	1	-1	0	0	-2	-1	0	0	-1	0	-2	0	-1	-2	0	1	3
W	-6	2	-4	-7	-8	-5	-7	-7	-3	-5	-2	-3	-4	0	-6	-2	-5	17
Y	-3	-4	-2	-4	0	-4	-4	-5	0	-1	-1	-4	-2	7	-5	-3	-3	0	10
V	0	-2	-2	-2	-2	-2	-2	-1	-2	4	2	-2	2	-1	-1	-1	0	-6	-2	4

W. A Pearson, Rapid and Sensitive Sequence Comparison with FASTP and FASTA, in Methods in Enzymology, ed. R. Doolittle (ISBN 0-12-182084-X, Academic Press, San Diego) 183(1990)63-98.

The matrix is frequently used to score aligned peptide sequences to determine the similarity of those sequences. The numbers given above were derived from comparing aligned sequences of proteins with known homology and determining the "accepted point mutations" (PAM) observed. The frequencies of these mutations are in this table as a "log odds-matrix" where:

M_ij = 10(log₁₀R_ij), where M_ij is the matrix element and R_ij is the probability of that substitution as observed in the database, divided by the normalized frequency of occurrence for amino acid i. All of the number are rounded to the nearest integer. The base-10 log is used so that the numbers can be added to determine the score of a compared set of sequences, rather than multiplied.

How does Tandem model point mutations?

Tandem uses the concept of refinement to make checking for point mutations practical. Refinement is simply the idea of identifying a short list of potentially interesting model protein sequences and then exhaustively analyzing those sequences for the best matches to a list of MS/MS spectra. Once the list of model protein sequences has been identified, it first analyzed for:

multiple missed protease cleavage sites;
multiple potential chemical and post-translational modifications;
unanticipated peptide bond cleavages;
modified N-terminii;
modified C-terminii; and finally
point mutations

The point mutation analysis is done using the following steps:

list all possible proteolytic peptides;
select a peptide
systematically change each residue in the peptide for all other possible amino acids and score the mutated peptide (and all potential modifications) against all of the available MS/MS spectra
record the best models, if they are better than previous models
repeat 2-4 until the list of possible proteolytic peptides has been completely analyzed.

It should be noted that only one point mutation is considered per proteolytic peptide. The hypothesis that there is most likely only one point mutation per peptide is a biologically reasonable one for analyzing peptides against a host proteome: it is too limited if the analysis is to be against a taxonomically distant proteome. It will, however, catch almost all somatic and germ line point mutations.

Mutation Mass Shifts

Residues DOWN the left indicate the EXPECTED residues.
Residues ACROSS the top indicate the MUTANT residues.

		G	A	S	P	V	T	C	I/L	N	D	K/Q	E	M	H	F	R	Y	W
		57	71	87	97	99	101	103	113	114	115	128	129	131	137	147	156	163	186
G	57	...	14	30	40	42	44	46	56	57	58	71	72	74	80	90	99	106	129
A	71	-14	...	16	26	28	30	32	42	43	44	57	58	60	66	76	85	92	115
S	87	-30	-16	...	10	12	14	16	26	27	28	41	42	44	50	60	69	76	99
P	97	-40	-26	-10	...	2	4	6	16	17	18	31	32	34	40	50	59	66	89
V	99	-42	-28	-12	-2	...	2	4	14	15	16	29	30	32	38	48	57	64	87
T	101	-44	-30	-14	-4	-2	...	2	12	13	14	27	28	30	36	46	55	62	85
C	103	-46	-32	-16	-6	-4	-2	...	10	11	12	25	26	28	34	44	53	60	83
L/I	113	-56	-42	-26	-16	-14	-12	-10	...	1	2	15	16	18	24	34	43	50	73
N	114	-57	-43	-27	-17	-15	-13	-11	-1	...	1	14	15	17	23	33	42	49	72
D	115	-58	-44	-28	-18	-16	-14	-12	-2	-1	...	13	14	16	22	32	41	48	71
Q/K	128	-71	-57	-41	-31	-29	-27	-25	-15	-14	-13	...	1	3	9	19	28	35	58
E	129	-72	-58	-42	-32	-30	-28	-26	-16	-15	-14	-1	...	2	8	18	27	34	57
M	131	-74	-60	-44	-34	-32	-30	-28	-18	-17	-16	-3	-2	...	6	16	25	32	55
H	137	-80	-66	-50	-40	-38	-36	-34	-24	-23	-22	-9	-8	-6	...	10	19	26	49
F	147	-90	-76	-60	-50	-48	-46	-44	-34	-33	-32	-19	-18	-16	-10	...	9	16	39
R	156	-99	-85	-69	-59	-57	-55	-53	-43	-42	-41	-28	-27	-25	-19	-9	...	7	30
Y	163	-106	-92	-76	-66	-64	-62	-60	-50	-49	-48	-35	-34	-32	-26	-16	-7	...	23
W	186	-129	-115	-99	-89	-87	-85	-83	-73	-72	-71	-58	-57	-55	-49	-39	-30	-23	...