The Global Proteome Machine Organization
   GPMDB in SQLite

The SQLite version of GPMDB was created to allow for the creation of simple, portable versions of individual GPM result files. GPM result files are written in XML: the SQLite version of these files contains the same information as the GPMDB entry associated with that file, as well as the peak lists from the file (which is not normally stored in GPMDB).

The table structure and columns were kept as similar to GPMDB as possible. Data types were converted to the simpler set of types available in SQLite. All primary keys were set to be auto-incremented. Two tables that are in GPMDB (Path and ProteinRevision) were removed, as they have no role in a single file database. The Spectrum table was added, to record the peak list information associated with each identification.

The prefered extension for files in this format is ".gpmdb" to distinguish them from other files generated using SQLite. A Perl script for generating these files, an example GPM XML file and its corresponding .gpmdb file are available here.

Result
column type description
resultid INTEGER auto-incremented primary key
pathid INTEGER not used
file TEXT name of the original data file
completed INTEGER not used
active INTEGER not used
rating INTEGER not used
comments TEXT not used
tandemversion TEXT version number for the search engine
unique_proteins INTEGER not used
geo_stdev REAL not used
Protein
column type description
proid INTEGER auto-incremented primary key
resultid INTEGER value of the corresponding Result.resultid
proseqid INTEGER value of the corresponding Protein.proseqid
expect REAL protein expectation value
pida INTEGER first protein id number
pidb INTEGER second protein id number
uid INTEGER protein unique identifier from the original file
ProSeq
column type description
proseqid INTEGER auto-incremented primary key
seq TEXT protein sequence
label TEXT protein accession number/text identifier
label_aux TEXT additional protein accession number/text identifier (if applicable)
rf INTEGER protein reading frame (if applicable)
Peptide
column type description
pepid INTEGER auto-incremented primary key
proid INTEGER value of the corresponding Protein.proid
seq TEXT peptide sequence
mh REAL parent ion M+H+ in Daltons
expect REAL identification expectation value
start INTEGER N-terminal protein coordinate of this peptide
end INTEGER C-terminal protein coordinate of this peptide
charge INTEGER parent ion charge
delta REAL parent ion mass error
dida INTEGER first peptide identifier
didb INTEGER second peptide identifier
didc INTEGER third peptide identifier
AA
column type description
aaid INTEGER auto-incremented primary key
pepid INTEGER value of the corresponding Peptide.pepid
type TEXT amino acid residue
at INTEGER position of the residue in protein coordinates
modified REAL mass of residue modification
pm TEXT mutated amino acid residue (if applicable)
Project
column type description
projid INTEGER auto-incremented primary key
resultid INTEGER value of the corresponding Result.resultid
name TEXT person(s) associated with this data
institution TEXT institution associated with this data
email TEXT an email address of the person(s)
project TEXT a title for the project that generated this data
comments TEXT text description of this data
Spectrum
column type description
specid INTEGER auto-incremented primary key
dida INTEGER value of the corresponding Peptide.dida
parent_mz REAL parent ion mass-to-charge ratio
parent_z REAL parent ion charge
description TEXT description of the spectrum
mzs TEXT space-delimited list of peak m/z values
ints TEXT space-deliminted list of peak intensity values
Copyright © 2011, The Global Proteome Machine Organization