GPMDB SQLite

The Global Proteome Machine Organization

GPMDB in SQLite

The SQLite version of GPMDB was created to allow for the creation of simple, portable versions of individual GPM result files. GPM result files are written in XML: the SQLite version of these files contains the same information as the GPMDB entry associated with that file, as well as the peak lists from the file (which is not normally stored in GPMDB).

The table structure and columns were kept as similar to GPMDB as possible. Data types were converted to the simpler set of types available in SQLite. All primary keys were set to be auto-incremented. Two tables that are in GPMDB (Path and ProteinRevision) were removed, as they have no role in a single file database. The Spectrum table was added, to record the peak list information associated with each identification.

The prefered extension for files in this format is ".gpmdb" to distinguish them from other files generated using SQLite. A Perl script for generating these files, an example GPM XML file and its corresponding .gpmdb file are available here.

Result
column	type	description
resultid	INTEGER	auto-incremented primary key
pathid	INTEGER	not used
file	TEXT	name of the original data file
completed	INTEGER	not used
active	INTEGER	not used
rating	INTEGER	not used
comments	TEXT	not used
tandemversion	TEXT	version number for the search engine
unique_proteins	INTEGER	not used
geo_stdev	REAL	not used

Protein
column	type	description
proid	INTEGER	auto-incremented primary key
resultid	INTEGER	value of the corresponding Result.resultid
proseqid	INTEGER	value of the corresponding Protein.proseqid
expect	REAL	protein expectation value
pida	INTEGER	first protein id number
pidb	INTEGER	second protein id number
uid	INTEGER	protein unique identifier from the original file

ProSeq
column	type	description
proseqid	INTEGER	auto-incremented primary key
seq	TEXT	protein sequence
label	TEXT	protein accession number/text identifier
label_aux	TEXT	additional protein accession number/text identifier (if applicable)
rf	INTEGER	protein reading frame (if applicable)

Peptide
column	type	description
pepid	INTEGER	auto-incremented primary key
proid	INTEGER	value of the corresponding Protein.proid
seq	TEXT	peptide sequence
mh	REAL	parent ion M+H⁺ in Daltons
expect	REAL	identification expectation value
start	INTEGER	N-terminal protein coordinate of this peptide
end	INTEGER	C-terminal protein coordinate of this peptide
charge	INTEGER	parent ion charge
delta	REAL	parent ion mass error
dida	INTEGER	first peptide identifier
didb	INTEGER	second peptide identifier
didc	INTEGER	third peptide identifier

AA
column	type	description
aaid	INTEGER	auto-incremented primary key
pepid	INTEGER	value of the corresponding Peptide.pepid
type	TEXT	amino acid residue
at	INTEGER	position of the residue in protein coordinates
modified	REAL	mass of residue modification
pm	TEXT	mutated amino acid residue (if applicable)

Project
column	type	description
projid	INTEGER	auto-incremented primary key
resultid	INTEGER	value of the corresponding Result.resultid
name	TEXT	person(s) associated with this data
institution	TEXT	institution associated with this data
email	TEXT	an email address of the person(s)
project	TEXT	a title for the project that generated this data
comments	TEXT	text description of this data

Spectrum
column	type	description
specid	INTEGER	auto-incremented primary key
dida	INTEGER	value of the corresponding Peptide.dida
parent_mz	REAL	parent ion mass-to-charge ratio
parent_z	REAL	parent ion charge
description	TEXT	description of the spectrum
mzs	TEXT	space-delimited list of peak m/z values
ints	TEXT	space-deliminted list of peak intensity values