GPMDB Guide to the Mouse Proteome

A chromosome-by-chromosome look at proteins observed, 2004-2016

Version 22: 2016.7.01

Editor: Ronald C. Beavis

Introduction

GPMDB began recording information about the Mouse Proteome on January 1, 2004. It was the first system to use the ENSEMBL sequence annotation system as a source of protein sequences and as a result it now has ten years of retrospective proteomics information.

Methodology

The spreadsheets were assembled using the ENSEMBL protein splice varant accession numbers for ENSEMBL build 76 (mouse genome assembly GRCm38). The accession numbers were separated into groups on a chromosome-by-chromosome basis and GPMDB was queried to determine which of those accession numbers had been observed. If an accession number had a GPMDB record, the following data was extracted and represented in the attached spreadsheets:

1. rank: this number is the numerical rating of the best observation for that protein sequence, based on its log(e) value (see #5 below);

2. ENSEMBL splice: the ENSEMBL v. 76 protein accession number;

3. ENSEMBL gene: the ENSEMBL v. 76 gene accession number;

4. # obs.: the number of times the protein has been observed;

5. log(e): the lowest (best) expectation value observed for that protein;

6. MGI: the mouse genome informatics database abbreviation for the gene associated with the protein;

7. EC: evidence code (described at http://wiki.thegpm.org/wiki/GPMDB_evidence_codes);

8. Start: the first nucleic acid residue in the associated gene, in chromosome coordinates;

9. End: the last nucleic acid residue in the associated gene, in chromosome coordinates;

10. Strand: the direction for reading the gene from the chromosome;

11. Band: the chromosomal band that the gene occupies; and

12. Description: a text description of the associated gene's function.

In addition to the 19 autosomal chromosomes and the 2 sex chromosomes, separate spreadsheet was compiled for the mitochondrial DNA (MT) and genes present on haplotypes or patches (OTHER). The EC calculations were made with the NBS 2 algorithm.

Notes

1. No attempt has been made to use protein accession numbers not found in ENSEMBL v. 76. GPMDB has recorded information from many versions of ENSEMBL, some of which contained accession numbers no longer present in v. 76. No algorithms have been used to attempt to re-use that information by projecting protein sequences or chromosomal locations back onto the current ENSEMBL assembly.

2. The MGI, Start, End, Strand, Band and Description information recorded here was taken directly from the ENSEMBL BioMart system and recorded without further editing or curation.

3. No numerical cutoffs have been used in assembling these tables, except that to be counted a protein identification must have had a expection value less than or equal to 1, i.e., log(e) ≤ 0.