The Global Proteome Machine Organization
The Global Proteome Machine
The home of proteomics crowd-sourced "Big Data"
   QUACK: Quality Assurance & Control Knowledge base
Most of the research into proteomics data analysis algorithms has centered on trying to extract as much biologically relavent information as possible from the results of an experiment. Significant improvements in laboratory technique and instrument performance have made it possible to extract far more information than was available even a few years ago.
The increase in data volume and improvements in laboratory methods have opened up a new field for informatics research in proteomics, which can be broadly classified as the development of Quality Assurance and Quality Control algorithms. The purpose of the these algorithms is not to extract biological information: it is to provide rapid feedback to experimental groups about problems in experimental design, laboratory technique or instrument measurment stability that make results unsuitable for their intended purpose. These algorithms should promote the idea of a "virtuous circle", in which the informatics and experimental groups have the tools available so that precious experimental and computation resources are not wasted on generating and processing sub-par experimental results. Informatics fixes for bad data are never as productive as a commitment to generating the best data possible.
The purpose of this raw data repository is to provide real experimental data to facility the development of QA/QC algorithms. The data will not be the pristine, high quality data that supports published research: instead it is data with fatal flaws either caused by single blunders or an accumulation of smaller problems that collectively render the data unsuitable for use. Painstaking analysis of this often highly complex data is not important. Instead, providing a simple report back to an analyst highlighting what is wrong should be the goal.
Not all data flaws are of equivalent importance. A flaw that may be of critical importance when using a data set for quantitation may be merely an annoyance when the data is being used for parent ion mass calibration. The issues associated with each data set will be annotated on a three point scale, depending on the overall context of the associated experiment:
  1. Fatal: if detected, the data should be rejected on this basis alone;
  2. Incremental: undesirable, but only fatal in combination with other problems; and
  3. Quibbles: may annoy experts, but does not affect the usablity of the data.
Note: only data explicitly meant for algorithm development will be included on this site. No data associated with biological or biomedical experiments or associated publications will be permitted. If you have some data that you would like to contribute, please contact Ron Beavis.
Copyright © 2012, The Global Proteome Machine Organization. Privacy Statement