BIOML Proposal, 19990220
The Biopolymer Markup Language - BIOML
Working Draft Proposal
3. Elements and tags
3.1 Introduction
3.2 Gene-specific elements
3.2.1 Summary
3.2.2 A simple <gene> example
3.3 Protein-specific elements
3.3.1 Summary
3.3.2 Simple <protein> examples
3.4 General elements and tags
3.4.1 General purpose elements
3.4.2 Organism-identifying elements
3.4.3 Location elements
3.4.4 Literature references
3.4.5 Database reference
3.4.6 URL-based resources
3.4.7 Binary data
3.4.8 Forms
3.4.9 Global attributes and entities

3.4 General elements and tags The elements described in this section are the additional bits and pieces necessary to describe a biopolymer. For example, a particular gene may have been first described in a journal article. Therefore, there should be a <reference> element that can be associated with that gene to point to that description. A protein may be a part of in a specific <organism>, which should be described using standard taxonomy.
3.4.1 General-purpose elements These elements are generally leaves rather than branches: they can be associated with almost any other element and they are used to attribute descriptive names and information with that element.

Element Attributes Functions
name A text name for an enclosing element
alt_name order A text alternate name for an enclosing element
note id
Text describing an enclosing element
comment Text describing the BIOML code. This element should be ignored by any browser.
copyright The copyright declaration for a BIOML file.
3.4.2 Organism-identifying elements These elements are for use in describing the organism that a biopolymer is was a part of originally. The organism tags will surround the biopolymer tags, because the biopolymer is part of the organism, not vice versa.

Element Attributes Functions
organism id
Encloses the specification of an organism
species id Encloses the genus and species for an organism
common_name The common name for the organism
alt_common_name An alternative common name for the organism
taxon id
A specification of any additional, relavent taxonomy
3.4.3 Location elements These elements describe where a protein is located within an organism, that is they allow the specification of tissue, cell type, and subcellular organelle inwhich a particular protein is found. These tags also logically enclose the protein. Nucleic acids do not require this type of specification, because there location is fixed in the nucleus for DNA and it is irrelavent biologically for mRNA.

Element Attributes Functions
tissue id
Encloses the specification of a tissue type
cell id
Encloses the specification of a cell type
organelle id
Encloses the specification of an organelle type
particle id
Encloses the specification of a particle type
3.4.4 Literature references The purpose and use of these tags is self-evident. All of the elements other than <reference> are leaves of reference and they only have meaning when they are enclosed by <reference> ... </reference>tags.

Element Attributes Functions
reference id Encloses a literature reference
author One of the authors, e.g., "Beavis RC"
title Title of the reference
journal Name of the journal that published the article
book_title Title of the book containing the article
editor An editor of the appropriate book
volume Number of the journal volume
or book in series
pages The article page numbers

Writing down the details of a reference using these elements is one way to deal with the problem of adding literature attributions to a file. An alternate method, which is very effective and compact is to use the Medline reference number for a particular reference and leave the details out of the BIOML file altogether:
  <db_entry format="MEDLINE" entry="80120725">
3.4.5 Database reference Most of the information about biopolymers is currently held in large computerized databases, which is the driving force behind the development of BIOML. These elements allow BIOML to capture information about databases and make reference to them. The <db_entry/> element and its attributes id, ac,format, and query are used to make reference to a particular database entry. query is a of the Universal Resource Locator (URL) for a copy of a database and a properly formatted query string that would allow a database entry to be retrieved by an HTTP or FTP network request. This element is a special case of the general <query/>. <db_entry/> elements are always empty element described in the next section.

Element Attributes Functions
db_entry id
A reference to a database entry
3.4.6 URL-based resources These elements point to specific resources available by a network protocol such as FTP or HTTP. These resources can either be in the form of a particular <file/> held on a remote server, or it can be a request for a particular piece of information using a Common Gateway Interface (CGI) style <query/>. Both of these elements are always empty.

Element Attributes Functions
file format
An element pointing to a file
query format
An element querying to a server program
3.4.7 Binary data These element types allow the inclusion of blocks of formatted data. The data can be in any one of a number of formats, such as GZIP or ZIP compressed, JPEG or GIF formatted graphical information, or any other type of binary data. Elements within <binary> tags will necessarily be only understandable by a computer. <data> tags enclose formatted text data that has a strict, agreed upon format, for example, PDB format atomic co-ordinates.

Element Attributes Functions
binary format
An element enclosing formatted binary information
data format
An element enclosing formatted text information
3.4.8 Forms One of the strengths of HTML in its current incarnation is the inclusion of elements that allow a programmer to create "fill-in-the-blanks" forms that are used to communicate with HTTP server-side programs. BIOML also includes a set of elements for laying out simple forms that can be used to send information to server programs.

Element Attributes Functions
form type
An element enclosing a form
input type
An element allowing the input of data
text An element enclosing text to be inserted into the form
3.4.9 Global attributes and entities There are some attributes that may be included with any element, but which have not been listed individually above, to prevent repetition. These attributes are as follows:

Element Attributes Functions
All label This attribute provides a text identifier for a particular element that can be used by a browser as a place-holder for that element
All state This attribute provides a browser with a hint as to how to display a particular element
All id This attribute provides a browser with a number to use for cataloging an element
Entities are globally available mnemonics for characters and combinations of characters that are either not available on standard keyboards or which would confuse a browser. For example, the presence of quotations marks, ampersands and right and left angle brackets can confuse browsers. These symbols should be replaced by the entities listed below when they occur in text, as opposed to when they occur in tag specifications. Entities all follow the general format of &xxx; where the xxx is the replacement mnemonic.

Symbol Entity Symbol Entity
& &amp; < &lt;
" &quote; > &gt;
&copyright; &registered;
&emdash; &endash;
&plusminus; &degree;
&angstrom; ASCII character &#int;
(int=decimal equivalent)
new line &newline; new paragraph &paragraph;
&bullet; new bulleted point &point;

3.3 Protein-specific elements TOC 4. Extending BIOML