3.3 Protein-specific elements
Summary |
In the same way that "gene" is a concept, "protein" is also a concept that
existed long before there was any knowledge of its physical structure. A
protein was originally conjectured to be the fundamental particle that
performed some specific function, such as catalysis in the case of enzymes.
Since there original conjecture, the complete structure of a large number of
proteins has become known. Proteins are composed of one or more subunits. Each
subunit is composed of one or more linear polypeptide molecules, which are
polymers of twenty different amino acids (called residues). Within one subunit
or polypeptide chain there may be additional bonds between individual cysteine
residues, leading to a more complex, non-linear bonding structure. Many amino
acids can be modified once they have been incorperated into a polypeptide (a
post-translational modification) and the presence of these modifications may
have a strong influence on the functionality of the final protein molecule.
|
|
The following elements are proposed to express the idea of a protein and its
composition of subunits with their component, highly ordered polypeptide
chains. The concepts of modification and annotation of peptide-specific
structures is also included.
|
Protein elements |
The highest level attributes for a
protein is the protein itself and the subunits that make up the protein.
Subunits are the polypeptide chains that are assembled (usually non-covalently)
to make up the protein. For the purposes of BIOML, a subunit is any peptide
components of a protein that are the results of the translation of a single
mRNA. The possibility of more than one peptide component exists because of the
possibility of editing of a translated mRNA peptide by enzymes within a cell.
The various elements that result from this editing are described below as
domains.
Element |
Attributes |
Functions |
protein |
comp |
Encloses a protein |
subunit |
comp |
Encloses a subunit |
homolog |
— |
Another organism with this peptide sequence |
|
Peptide elements |
Peptides are the long chains of amino
acid residues that make up a protein subunit. Peptide chains can be divided up
into functional regions, called domains, which may have particular structural
or functional attributes. Many gene products are made in a slightly longer form
than is present in the mature protein. If a domain is removed to turn the gene
product into a functional protein, it is called a "propeptide" (signified as <domain
type="propeptide">). Many gene products are made with a long (20–30
residue), hydrophobic domain at the N-terminus of the chain, which is removed
almost immediately following translation. This type of domain is called a
"signal" peptide (<domain type="signal">). Peptide domains that
remain in the mature protein are designated by <domain type="mature">.
Other types of domains are "alpha-helix", "beta-strand" and "beta-turn", with
obvious meanings. A special type of domain is also used to designate regions of
the sequence that may be found with more than one sequence, signified by <domain
type="variable">. The precise location and assignment of these
domains has only been performed for a small selection of proteins: many more
have been assigned by analogy to known domain structures.
|
|
It is very possible that some domains may overlap. Therefore it is not
necessary to enclose the residues in a domain with a domain element's tags. A
domain is an element of the enclosing peptide and can be specified using a
single <domain/> tag anywhere within that peptide. It is important
to note that domains always belong to a peptide: they are not peptides
themselves.
Element |
Attributes |
Functions |
peptide |
start
end |
Encloses a peptide |
domain |
start
end
id
type |
Specifies a generalized peptide domain |
|
Amino acid elements |
Individual amino acids are the building blocks of proteins. These amino acids
can be modified in a variety of ways. They can also be cross-linked to other
amino acids, either by disulphide bridges between cysteine residues or by the
presence of other cross-linking groups.
Element |
Attributes |
Functions |
aa |
type
at
to |
An amino acid element
Note: "to" applies for type="C" only |
amod |
at
type
occ |
An amino acid modification |
alink |
at
to
type
occ |
A generalized crosslink between two amino acids |
avariant |
at
type
occ |
A possible amino acid variant at a particular site |
|
3.3.2 Simple <protein> examples |
A problem that was not addressed above is how to refer to individual subunit,
peptides and amino acids, in cases such as the representation of the
composition of a protein, or to clearly state cross-links between different
peptides in a single subunit. In BIOML this problem is addressed by
systematically using the id attributes in subunit and peptide tags. Each
new peptide in a subunit is given a sequential numerical id, starting
with "1" for the first peptide written for a subunit. Similarly, subunits are
given numerical id values, begining with "1" for the first subunit
written for a protein. These numbers are used to refer to those elements.
|
|
For example, if a <protein> consists of two copies each of two
different subunits, <subunit id="1"> and <subunit id="2">,
the the protein's tag can be written as
<protein comp="2xS[1]+2xS[2]">.
Similarly, a cysteine in <subunit id="1">, in <peptide
id="1"> at position 5 that is cross-linked to a cysteine in <subunit
id="2">, in <peptide id="2"> at position 20 can be
completely described the the tag
<aa type="C" at="S[1]P[1]A[5]" to="S[2]P[2]A[20]"/>
or
<aa type="C" at="5" to="S[2]P[2]A[20]"/>.
Any redundant information can be left out of this type of description. In this
example, the specification of at="S[1]P[1]A[5]" is not required: the
second tag has the same information because the enclosing peptide and subunit
tags will clearly indicate to which subunit and peptide that amino acid
belongs. It is also possible to include domains (D[]) in this nomenclature,
using the general notation S[...]P[...]D[...].
|
|
The best way to see how to apply these tags is through real examples. The
following example demonstrates a simple BIOML file for human insulin. The file
uses a few tags from the next section, but they should be self explanitory. Try
to work through the logic of the example, remembering that everything between
the start and end tags for a particular element "belong to" that element. Also
remember that if a set of element tags don't enclose anything, then it is
acceptible to just write the opening tag with a "/>" at the end of the tag
description.
|
Example A. |
Insulin
example.
|
Example B. |
Insulin
single gene product. This example is considerably more complicated, showing the
use of multiple, overlapping domains and variant amino acids.
|
Example C. |
Insulin
gene and gene product. This example includes the information contained in the
previous two examples, but it also integrates the gene for insulin into the
document.
|
|
|