2.1 Introduction |
The BIOpolymer Markup Language is being designed to meet
or exceed a number of goals that are critical for the development and
acceptance of the language. BIOML must:
-
be extensibile, i.e., it should conform to the
XML
format;
-
be a faithful representation of the concept being described (protein/gene);
-
have the potential to be easily read by humans;
-
logically connect every element in a clearly expressed statement nesting
structure;
-
include data that is not ASCII and support compression as a basic data type;
and
-
support the conversion of other data files to and from BIOML.
|
|
These goals are laid out in order of importance. If any consideration affects
one higher on the list, then the higher priority goal will prevail in any
argument. The ability to logically connect data to a physical object's
individual parts is a the main driving force behind the development of BIOML.
|
2.2 Logical layout–trees, branches and leaves |
The diagram below shows a simple graphical relationship between a simple set of
objects that can be associated with a "protein" object.
|
|
The fundamental object (a protein) is connected to two branch objects
(its component pieces, subunit 1 and subunit 2) and one leaf
object (its name). The first of the branches (subunit 1) is
connected to another branch object (a peptide), which has a number of
leaf objects associated with it. The linear nature of peptide and
oligonucleotide biopolymers and the way that information about them has been
gathered and organized makes it possible to draw such a graph for almost every
concievable attribute and annotation of the biopolymer. BIOML is being designed
to take advantage of this fact.
|
2.3 Logical layout using nested statements |
The problem of writing down branched structures has been dealt with by computer
scientists in a number of ways. The method used in XML is a very
straightforward one. Using the example above, the protein is represented by an
opening "tag" represented by "<protein>" and a closing tag "</protein>".
Everything within those two tags is part of the tree illustrated above. Using
this notation, the tree can be re-written as follows:
<protein>
<name> ... </name>
<subunit id="1">
<name> ... </name>
<peptide>
<signal> ... </signal>
<propeptide> ... <propeptide>
...
</peptide>
</subunit>
<subunit id="2">
<name> ... </name>
<peptide id="1">
...
</peptide>
<peptide id="2">
...
</peptide>
</subunit>
</protein>
All of the relationships between items are the same as in the tree, but this
format is very easy to write out using ASCII characters. The ellipsis "..."
symbols represent any text that might be enclosed by the start and end tags. In
the language of XML, the ideas that are represented by "protein" or "name" are
called elements, while the symbols that are used to represent the start
and end of the pieces of information that make up the element are called
"tags".
1. Bioinformatics
|
TOC
|
3. Elements and tags
|
|