GPMDB Guide to the S. cerevisiae Proteome
A chromosome-by-chromosome look at proteins observed, 2004-2016
Version 2: 2016.8.01
Editor: Ronald C. Beavis
Copyright © 2016 Beavis Informatics Ltd.
 Introduction 
      GPMDB began recording information about the S. cerevisiae Proteome on January 1, 2004. It was the first system to use the ENSEMBL sequence annotation system as a source of protein sequences and as a result it now has ten years of retrospective proteomics information. 
 Methodology 
      The spreadsheets were assembled using the ENSEMBL protein ORF accession numbers for ENSEMBL EF4.72. The accession numbers were separated into groups on a chromosome-by-chromosome basis and GPMDB was queried to determine which of those accession numbers had been observed. If an accession number had a GPMDB record, the following data was extracted and represented in the attached spreadsheets: 
 1. Chr: the chromosome associated with the ORF, using A-P for the nuclear chromosomes and Q for the mitochondrial chromosomes; 
 2. ORF: the EF4.72 ORF accession number; 
 3. Gene: the SGD gene name corresponding to the ORF; 
 4. Observations: the number of times the protein has been observed; 
 5. Best log(e): the lowest (best) expectation value observed for that protein; 
 6. EC: evidence code (described at http://wiki.thegpm.org/wiki/GPMDB_evidence_codes); and 
 7. Description: a text description of the ORF, from ENSEMBL. 
      The first four tabs below (EC1-EC4) are the ORFs grouped by evidence code that measure reproducibility of ORF observations by MS/MS-based proteomics measurements (EC1 = no credible evidence; EC2 = poor evidence; EC3 = moderate evidence; and EC4 = good evidence). The fifth tab (SC-ALL) has all of the ORFs in the same table, ordered by chromosome number and gene location. The sixth tab summarizes the number of ORFs in each evidence category. 
Notes
1.  No attempt has been made to use protein accession numbers not found in ENSEMBL EF4.72. GPMDB has recorded information from many versions of ENSEMBL, some of which contained accession numbers no longer present. No algorithms have been used to attempt to re-use that information by projecting protein sequences or chromosomal locations back onto the current ENSEMBL assembly.