A World-Wide Web Server for ACEDB based on Tace

John Barnett and Doug Bigwood
Genome Informatics Group, National Agricultural Library,
USDA, Beltsville, MD 20705 

Sam Cartinhour
Crop Biotechnology Center, Texas A&M University,
College Station, Texas 77843

I. Introduction

The delivery of information from ACEDB via the Internet has become increasingly important as more and more groups seek to offer their data from centralized, public sites. Of the various possible methods, probably those exploiting the World-Wide Web (WWW) and Hypertext Markup Language (HTML) are the most important. A very successful ACEDB-WWW interface has been develped by Guy Decoux at INRA (Moulon, France) and is now used to deliver information from a wide variety of ACEDB databases. The ready-to-install 'Moulon' Server (MS) can be thought of as one interpretation of how a WWW interface to ACEDB should behave. However, it is easy to imagine alternatives for markup, layout, and functionality. To make it possible to explore these, we have created an ACEDB server and client on which new ACEDB-WWW interfaces can be built.

II. The Moulon Server

MS is based on xace with two significant changes. First, display routines have been rerouted so that their output is delivered via PostScript rather than X-windows; in the most recent version of MS, the PostScript is then passed to the gd package to create clickable GIF images. Second, tace functionality has been added to permit communication with the web server daemon.

Built in to MS are the rules for constructing HTML versions of objects, lists, and query forms. MS outputs objects in ordinary tace style with one important difference: it uses the HTML construction rules to overlay "clickable" xace items (links between objects) with "Universal Resource Locators" (URLs). These become clickable in the WWW environment. The links are simply ACEDB queries expressed in the required URL syntax. For example, the link from a Paper object to a Person "Tom" is essentially "find person tom". The URL also contains the name of the database and explicitly names MS as an executable in cgi-bin:

  ...cgi-bin/MS/sorghumdb?find+Person+%22tom%22
For the user, the transition between X-windows and a web browser (such as Mosaic) is straightforward, although links are navigated with one mouse click instead of two.

MS is positioned between the web server (e.g., httpd) and one or more databases. Typically it resides in the cgi-bin directory in the web server hierarchy. When a URL is activated via mouseclick on a web browser, the information is received by httpd and passed to MS for action. MS interprets the query and returns a HTML-formatted reply, which is sent back to the web browser by httpd.

One advantage of the WWW is that links to external data sources are possible. These are an important 'value-added' feature since genome databases typically contain a variety of potential links for strains, sequences, publications, and other data. MS does not provide for external links but they can be implemented by filtering server output through a perl script. The filter, essentially a series of "if-then" clauses, examines MS output line-by-line, 'grabs' the appropriate text and adds a hypertext link. The precise markup depends upon the database, the class, and the text (tags and data) on a particular line. For example, the line:

  Library		GenBank		ATTS1273	Z25996
in an AAtDB Sequence object would be modified to include a link to Genbank using the Genbank accession number as the visible part of the anchor. The result is that "Z25996" is replaced by a URL-style query directed towards the GenBank server:

  <a href="http://ncbi.nlm.nih.gov:2555/htbin/birx_by_acc?genbank+Z25996">Z25996</a>
The MS filter provides "on the fly" generation of external links and also gives the option of modifying MS output in other ways. A simple example is the removal of the "ISINDEX" HTML markup used by MS that invites users to "enter search keywords" into a text box. The caption, which is automatically presented when "ISINDEX" occurs, is misleading since correct ACEDB query syntax is required rather than a keyword. This feature is can be deleted and substituted with a different interface for formal queries.

Unfortunately the limitations of the filter are apparent once output from more than a few ACEDB databases with different models is involved. The filter is difficult to maintain and is not flexible enough as a foundation for more sophisticated tasks, such as connecting our collection of plant databases into a single "integrated" environment. These problems led us to consider a different approach to ACEDB data retrieval and presentation via the web.

III. The Perl Tace Server (PTS)

Although it would be possible to tailor a new specialized ACEDB server with the National Agricultural Library in mind, a more generally useful design would allow others to develop their own web-based views of ACEDB. For this reason our approach has been to remove the hard-coded rules for HTML markup and URL construction from the ACEDB server and locate them in a family of cgi-bin perl scripts (discussed in a later section). The ACEDB server together with the cgi-bin scripts act as the intermediary between the databases and the web server.

We are reluctant to create another major version of the ACEDB software. The server is based on stock tace, modified to output objects in a special way. The modifications have been integrated into the standard release (one immediate benefit is that the cgi-bin scripts are robust with respect to new versions of tace). To associate tace with a port, we use a perl wrapper. Tace is started from within a perl process (hence "Perl Tace Server" or "PTS") and remains running while the wrapper monitors the port for incoming data. Data (at this point in proper tace syntax) is passed to and from tace without modification (although processing could occur here). Each database is assigned a distinct port. Note that a database and its tace server need not be located on the same machine as the web server.

One major barrier in MS to alternative object representation is that certain information about the object has been lost. For example, tags and data cannot be distinguished reliably and thus it would be impossible to cast them in different fonts. This limitation exists in tace as well. The modified tace conserves information by delivering objects in a special annotated form (see Appendix 1). The output is not intended to be human-readable but rather is parsable by perl's "eval" statement to create a pointer. The annotation accomplishes two purposes. First, it preserves the tree structure of the object, which makes it feasible to perform a variety of "treeish" operations, including column alignment, pruning, grafting, and so forth if we should choose. Second, it distinguishes each object element (tag, link to another object, integer, floating-point number, datetype) and additionally identifies links to empty objects or to "Titled" objects in classes registered in options.wrm as "-T MyTag". No assumptions about markup (HTML or otherwise) are made.

This single modification makes it possible to design a variety of text-based WWW interfaces to ACEDB databases. An obvious incentive exists to extend the tace command set in different ways. For example, the "attach" command is potentially very useful but is not yet part of the tace repertoire. Maps and other specialized graphical displays cannot currently be accessed with direct tace commands. Eventually we would like to extend the tace command set to include these functions and perhaps ACEDB data analysis utilities as well.

IV. Tace client and Division of Labor

The remaining functions are compartmentalized into a family of perl scripts and libraries (see Appendix 2 for implementation details). We have chosen perl because it is simple to learn and ideal for processing complex text. In contrast to the dominant role played by MS, no one script has a preeminent status vis a vis the web server. This is true even of the script that establishes contact with the tace servers because it does not need to be used each and every time a database-related request is made. The specialized scripts constitute a sort of "Grand Central Station" within cgi-bin, where employees with different job descriptions act when required and remain uninvolved otherwise. By segregating tasks we simplify maintenance, make it easier for multiple developers to add new code, and execute code only when it is actually needed.

The modularization is reflected in the form taken by the URLs:

  .../cgi-bin/command/[database][/arg1][/arg2]...
For example, the cgi-bin script "find" is used to request a single object:

  find/database/class/object
  find/ricegenes/locus/wx
The "find" command is used extensively to represent links between objects. Other fundamental commands include: "classes" (to list available classes), "list" (to list all objects in a class), and "model" (to display model for a particular class). These basic commands are named after their tace equivalents but more complex and specialized commands are also possible. For example, "range" is used to create collapsed lists of objects. An "imap" command could be used to present a table showing a genetic map as a table of intervals, requiring the construction of a table definition, use of the tablemaker, and further processing to calculate the intervals. Other possibilities include generating queries which depend on results from previous queries, retrieving genetic maps from more than one database and comparing them to generate a syntenic map, and combining objects from different databases to create customized virtual objects.

In all of the cases above several cgi-bin scripts are involved. The "find" command uses the script collection to interpret the URL, converse with the tace server, and produce a formatted object. As much as possible we have isolated common procedures to make it straightforward to build new commands.

V. Markup

One advantage to the separation of markup from the tace server is that it is easier to gain control of markup. This is important for several reasons, not the least of which is that one ACEDB-WWW site may want to give its output a distinct "look" for aesthetic or functional reasons. An example of this might be to place floating point numbers into decimal-tabbed columns, or to put "empty" objects into a different font. As noted earlier, stock tace does not provide enough information to make this possible but the modified tace does, and associates this information with each node in the tree (Appendix 1).

The server-markup seperation also makes it more convenient to add markup to objects--for example, standard headers and footers, or URLs which point elements in an object to external databases (Appendix 3). As noted earlier, both are possible with MS by intercepting the server's output and modifying it before httpd sends it back to the web browser. However, since MS output is already marked up in HTML and formatted by indentation this is an awkward affair, made even more complex if multiple databases are involved (as is the case at the National Agricultural Library). When different kinds of markup are combined it is definitely more convenient to start with the unadorned object. The object can be examined and processed in various ways, with markup "directives" being stored as necessary, before actual markup is applied in a single pass.

Note that markup for external databases is simplified by the representation of ACEDB objects as trees. It is relatively easy to write rules which can recurse through the nodes of a tree, testing for certain patterns, then trigger an action if the test is satisfied (see Examples in Appendix 3). The action can involve attaching additional information to a node, for example the URL for the external data source. The same testing procedures could be used to accomplish other tasks, such as handle objects from a particular database or integer values in a special way.

VI. Conclusions

The Moulon server has made an extremely important contribution to the ACEDB community and biologists as a whole by making it easy and inexpensive to connect ACEDB to the WWW. As a result genome data for a wide variety of species is now available for web access. However, the separation of "web" functions from the ACEDB server makes it possible to develop novel web (and other servers) more easily. As HTML becomes richer and more complex, data providers will want the freedom to deliver data using whatever markup they choose. Providers will also want to extend ACEDB server functionality by creating new kinds of commands, for example, those which access multiple ACEDB databases and combine the results into "custom" objects.

Tace can be modified to deliver objects with their tree structure and other characteristics preserved. Web (or other) functionality can then be handled externally. We believe this approach shows promise and points out the need to create a text-based ACEDB server which is capable of representing ACEDB in toto, including access to its analysis tools and representation of its displays. Output should probably be in as general a form as possible; for example, a textual representation of a map display which could be used to build an image, rather than the image itself or PostScript.

The advantages conferred by the general approach make it possible to develop novel WWW and other interfaces to ACEDB. However, the potential is not limited to end-user data delivery. For example, another application would be a front end to tace which could be used to automate the update of remote databases. Two PTS could communicate on a master/slave basis with the master sending updates periodically to the remote server. Indeed, many such enhancements are possible because development can take place independent of ACeDB development.


Appendix 1: ACEDB Object Representation in Perl5

A. Data Structure

We represent an acedb object as a tree with nodes and branches. Each node in the object is represented as an associative array containing some or all of the following information:

# ty:    node type, one of: tag, text, int, float, datetype, object
                            tg   tx    in   fl     dt        ob
# va:    name of tag or value of int. type, or name of object
# cl:    name of class (only defined for type object)
# ti:    title (only defined for objects using -T option)
# mt:    empty (only defined for empty objects)
# Pn:    pointer to array of pointers to the nodes on the right
# Pm:    pointer to array of max field widths of fields in cols to right
         (filled in after object is retrieved from ACEDB)
# db:    database of origin (occurs at root node)
         (filled in after object is retrieved from ACEDB)
A typical node might contain:

  {ty=>'ob',
   cl=>'Paper',
   va=>'jones-1995-aabxc',
   ti=>'Sequence of ADH-1'}
Nodes are connected to each other via pointers. This is similar to what is done internally in ACEDB (using RIGHT and DOWN) but with a slightly different interpretation. In particular, the pointers are arranged so that nodes at the same branching "level" of indentation on the tree are not directly connected; instead, a node only points to nodes at the next level. For example, given an object like the one below, where tags are numbers and text fields are letters (with 0 as the root node):

  0 1 a
      b 
    2 c d
the tree can be drawn as:

  0---1-a
   \   \
    \   b
     \
      2-c-d
i.e.,

To connect perl objects, the branches from a node are represented as a pointer to a list of pointers to the adjacent nodes, e.g., in perl shorthand:

  {ty=>ob, va=>0, Pn=>[{node 1},{node 2}]}        0 points to 1 and 2

  {ty=>tg, va=>2, Pn=>[{node c}]}                 2 points to c
Here a pointer to an object/associative array is represented by a pair of braces {} while a pointer to a list is represented by a pair of brackets [].

Thus ignoring other information, the object above could be represented with the following expression, which can be directly evaluated to produce a pointer to the root node:

{ty=>ob, va=>0, Pn=>[{ty=>tg, va=>1, Pn=>[{ty=>tx, va=>a
                                          },
                                          {ty=>tx, va=>b
                                          }
                                         ]
                     }
                     {ty=>tg, va=>2, Pn=>[{ty=>tx, va=>c, Pn=>[{ty=>tx, va=>d
                                                               }
                                                              ]
                                          }
                                         ]
                     }
                    ]
}
Note that nodes 1 and 2 are connected via node 0 in this scheme; i.e., node 2 is not "DOWN" from node 1 in the sense that node 1 points to node 2.

B. Evaluation

The perl object is delivered as a literal from tace. Data is delimited with an arbitrary string (set by environment variable ACEQM) which can include "unprintable" characters. Before the literal is evaluated, single quote marks in the data (e.g., as in the word "can't") are protected:

  [DELIMITER]....cats can't swim....[DELIMITER]

  [DELIMITER]....cats can' . "'" . 't swim....[DELIMITER]
The data delimiter is then replaced with single quote marks:

  '....cats can' . "'" . 't swim....'
The rationale for handling single quotes in this fashion is that they prevent execution of any perl commands embedded in the data. When the object is evaluated using perl's "eval" a pointer to the root node is recovered and any "native" single quotes are restored.

C. Example

Below is a simple Author object in perl style. Line breaks and indentation have been added to increase legibility. At this point the object has been delivered by tace and modified slightly to prepare it as an argument for evaluation by perl's eval. Any original single quote marks have been protected and the special delimiter character replaced with a single quote.

{ty=>ob,
 cl=>'Author',
 va=>'Adams, S.',
 db=>'foodb',
 Pn=>[{ty=>tg,
       va=>'Full_name',
       Pn=>[{ty=>ob,
             cl=>'Contact',
             va=>'Adams, Sam'}]},
      {ty=>tg,
       va=>'Paper',
       Pn=>[{ty=>ob,
             cl=>'Paper',
             va=>'adams-1992-aagad',
             ti=>'My very first publication'},
            {ty=>ob,
             cl=>'Paper',
             va=>'jones-1992-aahfp',
             mt=>1},
            {ty=>ob,
             cl=>'Paper',
             va=>'smith-1993-aahhz',
             ti=>'Cats can' . "'" . 't swim'}]}]
}

Appendix 2: Modules

A perl5 module is a library file which can be included in other perl5 scripts, with particular enhancements. One of these is the potential to create an object class; an object in perl 5 is simply a reference (pointer) that has been associated with a specific module with the 'bless' command. Ordinarily, functions from external packages need to be referenced by the package name, but functions which apply to an object class (referred to as 'methods') are globally available to any object in that class.

To use ACEDB objects in Perl, a module was created with several basic methods of handling objects. The module is included in a perl script with the command

  use Aceobj;
which looks for a file called Aceobj.pm and evaluates it before the execution of the rest of the program. Every module which defines an object class must contain the method (ie, subroutine) 'new'; in Aceobj.pm this accepts as input a perl string (output from the modified tace) and calls the function eval to convert this string into a tree structure. The value returned from the method is a pointer to the tree structure; this is the object itself. Other methods available to objects of class Aceobj are:

printtree
print the object as a perl structure
wwwprint
print the object in html format
internal_url
add URLs to the object for any other ACEDB object references
columnize
arrange object into columns for wwwprint
prune
remove nodes (and sub-branches) matching a given criterion
external_markup
apply external markup rules to add URLs
loose_node_match
test a node to see if it matches another; only keys in the second node need to match, and certain special keys don't need to match (hence 'loose')
find_branch
find branches of the tree which match a list of nodes
add_urls
applies a single markup rule to a node
find_nodes
find nodes of the object which match a

Appendix 3: External Links

A. Introduction

The minimum expected of any ACEDB-WWW interface is that interconnections between objects will be represented as hypertext links in the web environment. As discussed in the main body of this paper, the "internal" links take the form of ACEDB queries, directed via URL back to the originating database. The internal links are constructed using data supplied by the database, such as object and class names, the name of the database itself, and a few invariant rules.

However, the web offers the opportunity to add significant value to a database by supplying additional links to external data resources. Often these take the form of links between individual objects and data at a remote site, for example between a sequence object and NCBI's GenBank server. In this case the originating (ACEDB) database ordinarily does not provide sufficient information to generate the URL. In particular it does not know that the data is available at a particular host using certain conventions. Creating the URL may also involve computing a key from one or more items in the object, testing that certain conditions are met, and so on. Each link may vary considerably in what is required, and a single object may need to be linked to several external sources.

To supply the additional information we have added support for "external markup" to the ACEDB-WWW interface. The external markup routines exploit the fact that the object is a tree in which node properties have been preserved. Markup rules (described in detail below) determine if an object is eligible for external markup and how the markup is to be generated. The information is attached to the relevant node for later interpretation. The actual generation of HTML for both internal and external links occurs later.

Our markup rules have evolved considerably since the project began as we required greater and greater power from them. The current form is by no means stable. In addition, it is possible that their complexity should be hidden by another layer, i.e. a simplified language from which the rules are generated. We have explored several approaches to this but have reached no conclusions.

B. Rule Structure

Markup rules consist of several parts, some of which are optional. The rules are written in the same perl syntax that PTS uses for output (see Appendix 1). In the discussion which follows, the parts will be presented independently and then combined to make working rules.

First, a rule contains a description of the root node of the object. The root specification mainly serves to identify the class of object that is affected by the rule, but it is also possible to set a requirement on the object name (in general, any node characteristic can be stated as a criterion):

  'root'   => {cl=>'Species'}

  'root'   => {cl=>'Species',va=>'Arabidopsis thaliana'}
If the root specification is omitted then the rule can potentially apply to any object, not just objects from a single class.

Note that the database name, which is in fact part of the root node, need not be explicit. This is because rules for each database are isolated in their own files and used when appropriate.

Second is the description of a series of nodes that form a branch or part of one. These nodes must form a contiguous structure; i.e., there cannot be gaps. However, a node need not be described beyond the fact that it exists. The branch ultimately determines where the external markup will appear; typically, the URL will be associated with the last node in the list.

'branch' => [{ty=>tg,va=>'Taxonomic_information'}, {ty=>tx}]

'branch' => [{ty=>tg,va=>'GeneFamily'},{cl=>'GeneFamily'}]

'branch' => [{ty=>tg,va=>'Library'},{cl=>'Source',va=>'GenBank'},{},{ty=>tx}]
These branches correspond respectively to objects with the structure

  Taxonomic_information Text

  GeneFamily ?GeneFamily

  Library ?Source Text Text
In the final case, the ?Source field is required to contain the value "GenBank". The empty braces could represent any kind of node; in this case they serve as a placeholder for the first Text field.

A valid rule can omit the branch specification. This implies that any markup will be associated with the root node. It is an error to omit specifications for both the root and branch.

Third is the node specification. This identifies the node in the branch list with which the external link will be associated.

 'node'   => 1
If the node is not specified the value defaults to the last node in the branch list if there is one or the root node if there is not. Note that the first node is indexed as '0', not '1'.

Fourth is the procedure used to generate the key or keys in the URL. This is an anonymous subroutine that returns a key value or a list of values. By 'key' we mean any data-dependent elements required to complete the URL, not necessarily a single value like an accession number. Multiple values may be required to construct a complete URL. A null return value signals that markup is not to occur. The procedure can be quite complex if necessary. Information can be drawn from any node in the object or drawn from another source.

The example below starts at the root node and traverses the tree to test for the value "Arabidopsis thaliana"; if it is found, it returns it as the key.

   'keys' => sub {my ($node,$root) = @_;
		if (@{find_nodes $root ({va=>"Arabidopsis thaliana"})}) {
			$node->{va};
		} else { undef;}
		}
This example checks for an entry in another database which lists items that should not be marked up. If the value is found the key is null; otherwise, a key is generated.

   'keys' => sub {my ($node,$root) = @_;
              dbmopen(%grin,"/kaos/WWW/8200/cgi-bin/grin",undef);
              my $omit = $grin{$node->{va}};
              dbmclose(%grin);
              if ($omit) {undef} else {$node->{va}}
             }
Special variables ($node and $root) are provided to make it easy to refer to the "current" node (in the 'node' specification) and the root node.

Very often URLs are formed in a simple manner from a value in a field or from the name of the object itself. The default for keys takes this into account. If keys are not specified a key will be created using the "value" of the last node in the branch list and from the root node:

  $keys[0] = $node->{va} #usually, the value of a field
                         #from 'branch'; if no 'branch' is
                         #specified, then value from root
                         #(the object's name)

  $keys[1] = $root->{va} #the name of the object
The last part of the rule is the URL specification. The URL can be constructed in situ or referred to by name (in this case, the URL has been "registered" in a file). The latter is useful if a URL is built the same way again and again. An example of the in situ method is

'urls' => [{'name'=>'WeedLocus',
            'URL'=>'http://weed.org:/cgi-bin/dbrun/aatdb?find+Locus+%22$keys[0]%22'}]
where @keys is the key list generated by the keys procedure. The URL handling routines are designed so that the information in @keys is sufficient to complete the URL.

Alternatively, a URL "name" can be supplied if the corresponding information has been registered in another file, with @keys playing the same role:

 'urls' => ['MetabolicEC']

C. Examples

What follows is a sampling of valid rules. Although rules for a particular database are ordinarily gathered into their own file, we have drawn examples from several sources in order to illustrate the range of possibilities.

Example 1
This rule is what might be thought of as a boilerplate: information is grabbed from a particular field and is used 'as is' to construct a URL. Here the rule is used to link a Sequence object (from AAtDB) to Mendel, a nomenclature database for sequenced plant genes.

{'root' => {cl=>'Sequence'},
 'branch' =>   [{ty=>tg,va=>'General'},
                {ty=>tg,va=>'Mendel_Gene_Family'},
                {cl=>'Text'}],
 'urls' => [{'name'=>'Mendel', 'URL'=>'http://origin.nalusda.gov:8200/cgi-bin/find/mendel37/GeneFamily/$keys[0]'}]
}
The root specification ensures that the rule applies only to objects from the Sequence class. The branch specification further requires the object to have used this part of the ?Sequence model:

   ?Sequence General Mendel_Gene_Family ?Text
The rule takes advantage of two useful defaults. First, 'node' defaults to the last node in the branch list (the Text field). Second, '@keys' contains the value extracted from that field, which will be a gene family name. The URL is constructed in situ simply by filling in the family name $keys[0] in the appropriate place.

Example 2
This rule omits a root specification entirely and applies to any object (from the Mendel database) containing a ?GenBank field. Matching objects are linked to three external data sources:

{'branch' => [{cl=>'GenBank'}],
 'urls' => ['GenBankAC','EMBLAC','GenoBaseAC']
}
External links of this sort are likely to be reused, so it is economical to refer to them by name and register them in a file:

GenBankAC:GenBank:http://ncbi.nlm.nih.gov:2555/htbin/birx_by_acc?genbank+$keys[0]
EMBLAC:EMBL:http://www.ebi.ac.uk/htbin/expasyfetch?$keys[0]
GenoBaseAC:GenoBase:http://genome.cornell.edu:8300/cgi-bin/partialgenobase.pl?$keys[0]
The default key is used and this time contains the contents of the ?GenBank field.

Example 3
"External" markup can also be used to provide services other than links to an external database. This example (from AAtDB) extracts an e-mail address and creates a link to a form that allows a user to send mail directly to that address. The required data is extracted from objects matching

  ?Contact Address E_mail Internet Text

{'root' => {cl=>'Contact'},
 'branch' => [{ty=>tg,va=>'Address'},
		{ty=>tg,va=>'E_mail'},
		{ty=>tg,va=>'Internet'},
		{ty=>tx}],
 'urls' => ['mailform']
},
The named URL is registered as:

mailform:mail form:http://origin.nalusda.gov:8300/cgi-bin/mailform.pl/$keys[1]/$keys[0]
The default for @keys provides mailform.pl with its two arguments: the name of the object in $keys[1] (the person to whom mail is being sent) and the e-mail address in $keys[0] (the contents of the Text field).

Example 4
Sometimes a value can be used to create an external link but there is no guarantee that the remote data source actually has the information being pointed to.

This rule (from Mendel) is used to create an external link from a species object to the Germline Resources Information Network (GRIN) database only if GRIN contains data about the species. The species name is extracted from the Text field in

  ?Species Taxanomic_information Text
and is checked against a dbm file containing a list of all species known to GRIN. Only if the check is successful is a key defined, in this case the species name.

{'root'   => {cl=>'Species'},
 'branch' => [{ty=>tg,va=>'Taxonomic_information'},
              {ty=>tx}],
 'keys'   => sub {my ($node,$root) = @_;
              dbmopen(%grin,"/kaos/WWW/8200/cgi-bin/grin",undef);
              my $omit = $grin{$node->{va}};
              dbmclose(%grin);	#close dbm file
              if ($omit) {undef} else ($node->{va})}
             },
 'urls' => ['GRINtax']
}
Example 5
The markup for an object may require information from different branches. In this rule (from Mendel) any object containing a ?GenBank field is linked to AAtDB if the object contains the exact words "Arabidopsis thaliana" as the value at any node. The subroutine "find_nodes" is provided for this purpose; its first argument is the starting node for the search and its second argument is what to find. A successful search creates a key containing the value from the ?GenBank field. An unsuccessful search yields an undefined key; in this case markup will not occur.

Note that this rule has exactly the same branch definition as the rule in Example 2. Any field can be the focus for more than one rule and both can contribute to the final markup.

{'branch' => [{cl=>'GenBank'}],
 'keys' => sub {my ($node,$root) = @_;
		if (@{find_nodes $root ({va=>"Arabidopsis thaliana"})}) {
			$node->{va};
		} else { undef;}
		},
 'urls' => ['AAtDB']
}