|
The Global Proteome Machine The home of proteomics crowd-sourced "Big Data" |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Index of GPMDB lists
Proteomics often requires the assembly of category wide lists of things. These
categories can be proteins associated with particular sequence or biological
properties, post-translational modifications, or types of experiments. GPMDB can be
used to generate of these lists and this page serves as an index to the lists announced
for the system.
Available lists of things
Post-translational modifications:
Proteins by classifiers:
Proteotypic peptides and annotated spectrum libraries:
The human protein identification
information in GPMDB has been summarized into a collection of spreadsheets that we are
calling the GPMDB Guide to the Human Proteome. This guide has the information
organized into separate spreadsheets for each chromosome, as well as three transposons
and mitochrondrial DNA. The protein accession numbers, HGNC names and chromosomal
coordinates were taken from ENSEMBL v. 65. This edition of the Guide (2012.01.01) is
available in the following formats:
The files are also available at the GPM FTP site:
ftp://ftp.thegpm.org/projects/annotation/human_proteome_guide/
The mouse protein identification
information in GPMDB has been summarized into a collection of spreadsheets that we are
calling the GPMDB Guide to the Mouse Proteome. This guide has the information organized into
separate spreadsheets for each chromosome, as well as NT transcripts and mitochrondrial
DNA. The protein accession numbers, MGI names and chromosomal coordinates were taken
from ENSEMBL v. 65. This edition of the Guide (2012.01.01) is available in the
following formats:
The files are also available at the GPM FTP site:
ftp://ftp.thegpm.org/projects/annotation/mouse_proteome_guide/
We have also compiled a list for the fruit fly proteome acetylation, based on the data
in GPMDB. This list is available in Excel
spreadsheet, tab-separated text and HTML formats. The list is composed of protein N-terminal and lysine
acetylations only.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
We have compiled a list of observed phosphorylation sites for the C. elegans
proteome, based on the data in GPMDB. This list is available in Excel spreadsheet, tab-separated text
and HTML
formats.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
We have also compiled a list of observed phosphorylation sites for the fruit fly
proteome, based on the data in GPMDB. This list is available in Excel spreadsheet, tab-separated text and
HTML formats.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
We have also compiled a list for the yeast proteome acetylation, based on the data in
GPMDB. This list is available in Excel
spreadsheet, tab-separated text and HTML formats. The list is composed of protein N-terminal and lysine
acetylations only.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
We have also compiled a list of observed phosphorylation sites for the yeast proteome,
based on the data in GPMDB. This list is available in Excel spreadsheet, tab-separated text and
HTML formats.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
We have also compiled a list for the mouse proteome acetylation, based on the data in
GPMDB. This list is available in Excel
spreadsheet, tab-separated text and HTML formats. The list is composed of protein N-terminal and lysine
acetylations only.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
We have also compiled a list for the human proteome acetylation, based on the data in
GPMDB. This list is available in Excel
spreadsheet, tab-separated text and HTML formats. The list is composed of protein N-terminal and lysine
acetylations only.
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
This list is a compilation of observed serine/threonine phosphorylation sites for the
Mycobacterium tuberculosis proteome (strain CDC1551), based on the data in
GPMDB. This list is available in Excel spreadsheet, tab-separated text
and HTML
formats. It contains 41 phosphorylation sites on 35 protein sequences, with the
following composition:
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
We have to again thank all of the data contributors who have made these comprehensive
lists possible. When using this type of information, please use normal caution.
Click here for our recommendations for using lists
of site assignments.
As a companion to the list of known human phosphorylation sites, we have also compiled
a similar list for the mouse proteome, based on the data in GPMDB. This list is
available in Excel spreadsheet,
tab-separated
text and HTML formats. It contains 22,855 phosphorylation sites on 8,190 protein
sequences, with the following composition:
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
We have to again thank all of the data contributors who have made these comprehensive
lists possible. When using this type of information, please use normal caution.
Click here for our recommendations for using lists
of site assignments.
We have come up with a list of known human phosphorylation sites, based on the data in
GPMDB, filtered through the same curation and quality control process that is used to
create the Annotated Spectrum Library collection. This list is available in Excel spreadsheet, tab-separated text and
HTML formats.
It contains 47,613 phosphorylation sites on 16,511 protein splice variant sequences,
with the following composition:
Each ENSEMBL splice variant protein accession number has a listing of all observed
sites in a single row, that looks like the following:
The columns have the following interpretation:
We have to thank all of the data contributors who have made this type of comprehensive
list possible. When using this type of information, please use normal caution. Click here for our recommendations for using lists of site
assignments.
The ENSEMBL protein accessions used in GPMDB can be readily assigned to specific Gene
Ontology (GO) terms, using ENSEMBL's BioMart utility. These lists for all available GO
terms have been constructed for three species:
The lists are divided up into the three main GO categories: biological process;
cellular component; and molecular function. For each individual has an entry like:
The first column has a link to the list of proteins associated with the GO term
accession number. The notation following the accession number "[n/m]" indicates that
"n" proteins have been observed in GPMDB out of the "m" proteins in the proteome
assigned to this category. The second category is a the controlled vocabulary
description of each GO category.
The lists below were constructed from data supplied by the Normal
Clinical Tissue Alliance. Proteomics data from selected studies of clinical tissue
were analyzed and conservative lists of indentified proteins were constructed. The
lists are organized by the best available BRENDA ontology term for the tissue, with the
exception of red blood cells, which are not currently in BRENDA.
The lists given below have the proteins in plasma removed (with the exception of the
plasma list).
These spreadsheets (top_1000_human_100707.xls
and top_1000_mouse_100707.xls)
list protein sequences that have been observed most often by GPM users who used the
"human" or "mouse" ENSEMBL proteome sequences. The columns in the spreadsheet are as
follows:
A "dataset" corresponds to a submitted set of MS/MS spectra, which results in a GPM
result file, so it is roughly equivalent to the set of data from an LC/MS/MS run. A
protein can only be observed once in a dataset. The value in Column F was calculated by taking the number of times (ni) that
the protein was observed in the approximately 24,000 (N) datasets examined and doing
the simple calculation:
pi = 100(ni/N)
Copyright © 2010-2011, The Global Proteome Machine Organization
|