BIOML Proposal, 19990220
The Biopolymer Markup Language—BIOML
Working Draft Proposal
4. Extending BIOML
TOC
4.1 Extension mechanisms
4.2 BIOML extension entities

4.1 Extension mechanisms The "X" in XML stands for "eXtensible". XML languages can be changed by anyone who writes a file in that language using a number of simple techniques:
  1. change the DTD;
  2. modify the content model in the file; or
  3. use the modification mechanisms provided by the designer.
Technique #1 should only be considered in extreme cases: if you need to modify the DTD, you probably should write your own specific XML. Modified DTD have to be made publicly available and they can lead to confusion between the original and modified DTD in documents.

Technique #2 is the simplest method, however it is still risky. It requires that the person performing the extension should understand the DTD in detail and be able to predict the behavior of their extension. For example, if a user really required an element called <duck> in their BIOML file, they could create a valid BIOML file that used that element by modifying the first few lines of their file:
<?xml version="1.0"?>
<!DOCTYPE bioml SYSTEM "bioml.dtd" [
  <!ELEMENT duck (#PCDATA)>
  <!ATTLIST duck position (flying|swimming) "flying" #REQUIRED>
]>
<bioml>
  ...
  <duck position="swimming">
    ...
  </duck>
  ...
</bioml>
The DTD-style definition of <duck> is included at the beginning of the file, within the DOCTYPE system tag. After its definition, the new tag can be used in that BIOML document, as though it belonged to the BIOML DTD.
NOTE: you cannot change an existing element definition using this mechanism. Elements can only be defined once in a DTD + document combination.

Technique #3 is the best method, although it is limited to changes that the designer of the XML thought would be needed by advanced users. This technique uses the same format as #2, but it takes advantage of the fact that while elements cannot be redefined in a document, entities can be redefined. In an XML, the first definition of an entity is taken to be correct, and all subsequent redefinitions are discarded. Also, the definitions in the !DOCTYPE tag are parsed first, so they take presidence over entity descriptions in the DTD. Therefore, by the cunning inclusion of entities in the DTD, it is possible to conveniently override many of the default definitions of elements in the DTD without rewriting it. In the next section, the specific entities in the current version of BIOML that have been provided specifically to be redefined are described.
4.2 BIOML extension entities
The following table describes each of the entities that have been added to BIOML specifically to aid in extending the language. These entities have been named using the convention "local.parent.type" where "local" identifies the entities as written for redefinition, "parent" is the name of the entity that will be affected by the redefinition, and "type" indicates whether an element's attributes (.attrib), content definition (".content") or attribute values (.value) will be affected by the redefinition.

Entity Description/use
local.aa_type.value For the addition of new symbols for amino acid residues.
local.biopolymer.content Allows the addition of new types of biopolymers to every element that can contain proteins, DNA or RNA.
local.db_attributes.attrib The user can add new attribute types to <db_entry> elements to use more familiar attribute names.
local.aa_type.value For the addition of new symbols for DNA nucleotide residues.
local.dom_type.value The basic set of domain types supplied with BIOML cannot hope to keep up with the growing list of domains identified in proteins. This entity allows the "type" attibute's required values to be changed to add any name required.
local.format.value For the addition of unanticipated binary file types.
local.global.attrib Most BIOML elements use the %global; entity as part of their ATTLIST definition. This entity allows the user to add any new attribute to all of these elements.
local.nr.content The %nr; entity is used in nearly every BIOML element to add common content. This local entity allows the user to add their own defined element to all of these elements' content models.
local.rna_type.value For the addition of new symbols for DNA nucleotide residues.
local.text_format.value For the addition of unanticipated binary file types.

A few things are worth noting:
  1. all of the local entities are defined in the DTD as being blank ("");
  2. .attrib entities only require the definition of the desired attribute and the #IMPLIED or #REQUIRED" descriptor; and
  3. .content and .value entities require a "|" at the beginning of their redefinition string. Failure to add this character will result in a parsing error.
An example of using this mechanism is as follows. If the user wishes to add the protein domain type "curly" to the values that are allowed by BIOML, the following code could be used (note the use of the "|" character in the entity definition):
<?xml version="1.0"?>
<!DOCTYPE bioml SYSTEM "bioml.dtd" [
  <!ENTITY local.dom_type.value "|curly">
]>
<bioml>
  ...
  <domain dom_type="curly">
    ...
  </domain>
  ...
</bioml>
Another example would be the addtion of the attribute "fluffy" to all of the elements that use the %global; entity as part of their ATTLIST definition.
<?xml version="1.0"?>
<!DOCTYPE bioml SYSTEM "bioml.dtd" [
  <!ENTITY local.global.attribute "fluffy (yes|no|maybe) 'no'">
]>
<bioml>
  ...
  <note fluffy="yes">
    ...
  </note>
  ...
  <protein fluffy="maybe">
    ...
  </protein>
  ...
</bioml>

3.4 Elements and tags TOC Appendix A