A Guide to Models and ace Files

Mary O'Callaghan, October 1993

Contents



Objects, Classes and Models

In ACEDB an "object" is an item, such as a sequence, a paper or an author, together with its attached data.

Illustration 1 shows a typical paper object, [cgc12].


Illustration 1
---------------

[cgc12]

  Reference Title Critical oxygen tension of C. elegans
  --------- -----
            Journal Journal of Nematology
            -------      
            Year    1977
            ----    
            Volume  9
            ------
            Page    253  254
            ----
 Author     Anderson GL
 ------     
            Dusenbery DB
    
 Type       ARTICLE
 ----
 Keyword    ENVIR CONDITIONS 
 -------    OXYGEN

ACEDB groups similar objects into "classes". Papers and multi-point-data objects, for instance, have a Paper class and a Multi_pt_data class respectively.

Each class has a "model", which provides the format for the contents of any object which belongs to that class.

There are two ways to see a class's model:


Illustration 2
--------------

?Paper

  Reference   Title UNIQUE ?Text
  ---------   -----
              Journal UNIQUE ?Journal XREF Paper
              -------
              Publisher UNIQUE Text
              ---------
              Contained_in ?Paper XREF Contains
              ------------
              Year UNIQUE Int
              ----
              Volume UNIQUE Int Text
              ------
              Page UNIQUE Int UNIQUE Int
              ----
  Author      ?Author XREF Paper
  ------
  Abstract    ?LongText
  --------
  Type        UNIQUE Text
  ----
  Contains    ?Paper XREF Contained_in
  --------
  Refers_to   Locus ?Locus XREF Reference
  ---------   -----
              Rearrangement ?Rearrangement XREF Reference
              -------------
              Sequence     ?Sequence XREF Reference
              --------
              Keyword      ?Keyword
              -------


Contents of the Models

Tags

Models state the "tags", or labels, that can be attached to the contents of objects in a particular class. The model for objects in the Paper class, for instance, includes Journal and Volume tags (See Illustration 2 above).

Tags are positioned to the left of the data entries they label. Tags are underlined in the Illustrations and other examples, so that readers can clearly distinguish between tags and their entries.

Some tags are subtags of other tags. Illustration 3 shows the ?Author model, in which the Address tag contains several subtags. Illustration 4 shows a typical Author object, "Jones A".


 Illustration 3
 --------------

 ?Author

    Full_name    Text
    ---------
    Laboratory   ?Laboratory  XREF  Staff
    ----------
    Address      Mail   Text
    -------      ----
                 E_mail Text
                 ------
                 Phone  Text
                 -----
                 Fax    Text
                 ---
    Paper        ?Paper
    -----

Illustration 4 -------------- Jones A Full_name Alan Jones --------- Laboratory CB ---------- Address Mail 25 Green Road, Cambridge ------- ---- E_mail alan@mrc-lmb ------ Phone 30535 ----- Fax 30536 --- Paper [cgc12] ----- [cgc120]

Why have subtags?

An object is displayed in a clearer and more orderly fashion in ACEDB when the system groups together items which have something in common, such as parts of an address, as subtag entries within a main tag.

Also, if a group of tags are subtags of a main tag and we want to delete that group (and hence their related data), rather than having to delete each tag individually as we would have to if they were not subtags, we can simply delete the main tag. For instance, rather than deleting Mail, E-mail, Phone and Fax each time we want to delete address data in an Author object, we can simply delete Address. Deletion details will be explained in section C.

Data fields

Models also indicate the nature of the data that can be given for each tag entry.

An "Int" data field to the right of a tag indicates that only an integer can be given for that data field. "Float" means that a floating point number must be used.

If "Text" or "?Text" is stated, any Ascii characters can be entered. "?Text" data, unlike "Text" data, can be searched with the "Text Search" option in the main ACEDB window.

There might be a few Ints and/or (?)Texts for one tag entry. For instance, the ?Sequence model includes the following:

        promoter Int Int ?Text 
        --------
A possible entry would be:
        promoter 2520 2526 TATA 
        --------

UNIQUE entries

In many cases, there is no restriction on the number of entries for a tag in an object. Sometimes, however, there can only be one entry.

This makes sense. In the case of objects belonging to the Paper class, for instance, whereas it is important that the Author tag can include several entries, there should obviously be only one entry in the Journal or Volume tags.

If the word "UNIQUE" is used in a model for objects in a class, there can only be one item to the immediate right of the "UNIQUE" for every item to the left.

Often "UNIQUE" is positioned between a tag and a data field. This means that there can only be one entry in that data field for that tag. For instance, the ?Paper model includes the following:

        Volume UNIQUE Int Text
This indicates that each Paper object can only contain one entry in the Int field of the Volume tag, though there can be several entries in the Text field.

"UNIQUE" can also be placed between two data fields in a model's tag. For instance, the ?Sequence model contains:

        "misc-feature Int UNIQUE Int"
         ------------
In this case, a misc-feature entry in a Sequence object can have only one integer in its second field for each one in its first.

Some tag entries are mutually exclusive (i.e. a multiple-choice type tag). The options are labelled with subtags. In the model there will be a UNIQUE between the tag and the subtags to indicate that in a given object a tag can only have one of the subtags (and its entry). The ?Allele model, for instance, includes:

 Source UNIQUE Gene ?Locus
 ------        ----
               Gene_class ?Gene-Class
               ----------
An Allele object might include the following:

       
 Source Gene_Class unc
 ------ ----------
                   let
or:

        
 Source Gene unc-33
 ------ ---- 
             let-45
However, it should not contain the following:

        
 Source Gene unc
        Gene_Class unc-33

Tag entries which are objects

Objects can have entries which are themselves objects, in the same or another class. For instance, Paper objects have an Author tag, the entries for which are objects in the Author class. In models for objects which include other objects in their tags, the syntax is as follows:

 Tag ?Class
 ---
(for instance, Author ?Author in the case of the Paper model)

This indicates that within an object with that model, any entry in the stated tag will be an object in the stated class.

What implications does the fact that a tag's entry is an object have for the user? An example will help to make this clear.

The Mapper tag for a Multi-pt-data object "ABC" contains the entry "Jones A". The Mapper tag section of the Multi-pt-data model states that Mapper entries are members of the Author class (ie Mapper ?Author).

As a result, when the text relating to the "ABC" Multi-pt-data object is displayed in a window, an entry in its Mapper tag, "Jones A", will be in bold face to indicate that it is an object in its own right. If "Jones A" is selected with the mouse, the "Jones A" object, together with its related tags and data, will be displayed in a new window.

Sometimes when tag entries are described in models as objects in a particular class, the words XREF and a tagname appear to the right of the class. eg. the Sequence model contains:

 Clone ?Clone XREF Sequence. 
 -----
This establishes a cross-reference from the tag entry which is an object in its own right, back to the object in which it is a tag entry.

In the example just given, Clone tag entries are described in the ?Sequence model as "?Clone", or objects belonging to the Clone class. What exactly do the words "XREF Sequence" mean? They indicate that the model for Clone objects contains a Sequence tag, entries for which are defined as "?Sequence", or objects belonging to the Sequence class. If a Sequence object contains a Clone object in its Clone tag, then that Clone object will automatically have the same sequence as an entry in its own Sequence tag. e.g. If we have a Sequence object "ABC", whose Clone tag entry is "Clone35", then the Clone object, "Clone35", will automatically include the Sequence object "ABC" in its own Sequence tag.

REPEAT entries

If the word REPEAT is stated after a data field in a tag, then that data field"s entries can each contain several items of a particular kind on a single line.

The Clone_Grid model includes the following:

    
 Row Int UNIQUE ?Clone XREF Gridded REPEAT        
Looking at the second half of the line, we see that there is a REPEAT statement for ?Clone entries. Hence the ?Clone part of two row entries might contain the following:

          
 Clone1 Clone2 Clone3 Clone4
  
 Clone5 Clone6 Clone7 Clone8 Clone9
Why have a UNIQUE before ?Clone and REPEAT after it? We learnt above that UNIQUE means there can only be one of what is to the right of UNIQUE for each element on the left. Since the row tag entry begins with Int, there can only be a single-line group of clones for each integer given.

Hence you could have the following:

        Row  1   A1 A2 A3 A4
             2   A6 A7 A8 A9 A10
but not:

        Row  1    A1 A2 A3 A4
                  A6 A7 A8 A9 A10
             2    B1 B3 B6 A5

Tags which have not got entries.

Some subtags in objects do not contain data entries. This is because the user's choice of a subtag, or subtags (see the "Ace files" section), from the group within a tag, is adequately descriptive in itself. In a model, subtags which should have no data entries have no data entry description. The Allele model, for instance, includes the following:

 Description  Recessive
 -----------  ---------
              Dominant 
              --------
              Semi-dominant
              -------------
              Weak 
              ----

Editing the models

If you want to change a class model, or perhaps add one, this must be done in the models.wrm file. If a new model is added to models.wrm the new class name must be entered in the Classes.wrm file. The syntax in classes.wrm is: #define _VClassname Class_number. If a new tag is added to models.wrm, the tag must also be included in tags.wrm.



Ace files

Adding/editing ACEDB data

There are two ways of editing data in ACEDB. First, objects can be edited manually using the Add/Delete/Rename option in the main ACEDB window menu and/or the Update option in the text window for any object. The second solution is to import data using files in "Ace" format.

What is an Ace file?

If a file is in Ace format, ACEDB can interpret the message it contains. Ace files can be used to add, delete, and rename objects, and to add data and comments to objects and delete data from them. The files, which must have the ".ace" extension, are read into ACEDB using the "Read Ace Files" option in the menu of the main ACEDB window.

Ace File Operations

Adding data

To add an object to the database, first state its class and name (e.g. Journal Cell) on a single line in an Ace file.

Data can be attached to this object on immediately subsequent lines. The data must be prefixed with suitable tags, such as Author, Journal, Volume and Page in the case of Paper objects. An object's potential tags are listed in the model for the class of objects to which that object belongs.

The steps are the same when the object is already in ACEDB and we simply want to add to its data. If, for instance, we wanted to add an author and a volume number to the paper "[wbg6]", we could put the following:

 Paper [wbg6]
 Author "Jones A"
 -----
 Volume  39
 -----
Note: I have only underlined the tags in examples in order to make the examples clearer. There should not be any underlining in an Ace file.

If you want to add several entries for a particular tag in an object, each of the entries must be listed on a separate line and each line must begin with the tag in question. The order of the lines dosen't matter. If, for instance, we needed to add several Gene tag entries to the Paper object, [wbg6], we might put the following in an Ace file:

 Paper [wbg6]
 Gene let-31
 ----
 Gene let-32
 ----
 Gene let-45
 ----
As mentioned earlier, some tags have subtags. In this case only label the data with the subtags; there is no need to mention the tag containing the subtags since ACEDB will know the tag to which the subtags belong, from the models.

If a subtag has no description to its right in a model, there should be no data to the right of that subtag when it is appended to an object in an Ace file. Your choice of a subtag or subtags is adequately descriptive in itself.

When adding data in an Ace file, it is important to have the correct data types (e.g. integer, floating-point) for each part of an object's tag entry, and to have those parts in the correct order. The format is defined in the object's class model (e.g. see the definitions given to the right of the tag entries shown in the Paper class model in Illustration 2).

If the word "UNIQUE" is in the model, you can only have one of what is on the right of the "UNIQUE" for each element on the left. The Paper model contains:

 Publisher UNIQUE Text. 
 ---------
This means that there can only be one entry for Publisher in a Paper object.

Usually in ACEDB when a new entry is added for a particular tag in an object (e.g. "Jones A" for the Author tag in a Paper object), the new tag entry will be added to any previous entries. However, if a tag can only contain one entry, the new entry will overwrite the old one.

For a full discussion of tags, data types, their order, "UNIQUE" etc, see section B above.

If mistakes are made in an Ace file, ACEDB will point them out when attempts are made to read in the file.

Renaming data

If you want to rename an object the Ace file syntax is as follows:

 -R Classname Oldobject Newobject
e.g. -R Author "Jones A" "Jones AB"

Deleting data

The Ace file syntax for deleting an object is as follows:

 -D Classname Objectname 
e.g. -D Author "Jones A"

To delete data which is attached to an object, first give the object's class and name on a single line. Then on subsequent lines say "-D Tagname" for each of the tags whose entries you want to delete. You only need to say "-D Tagname" once for each tagname in an object no matter how many entries that tag has. Each "-D Tagname" should be on a separate line. For example, the following statements will delete all the papers and laboratories associated with the Author "Jones A":

 Author "Jones A"
 -D Paper
 --------
 -D Laboratory
 -------------
If you want to delete entries for a particular tag in an object and add in new entries for that tag, the order of the Ace file statements is important. The deletion statement should come before you add in the new data, since if the deletion comes after the new data is added, both the new and old data will be deleted. For instance, if, in addition to wanting to delete the current papers for the Author object "Jones A", you wanted to add in some new Paper tag entries, you could say:

 Author "Jones A"
 -D Paper
    ----
 Paper "Worm tales"
 -----
 Paper "Worm mysteries" 
 -----
If it is stated in a model that there can only be one entry for a particular tag (e.g. "Title UNIQUE Text" in the Paper model - see the discussion of "UNIQUE" in section B above), and you want to replace an object's entry for that tag, there is no need to say "-D tagname", since the new entry will automatically overwrite the old.

Sometimes a tag has subtags. How is data deleted when this is the case? If you only want to delete the data in some of a tag's subtags then the Ace file syntax is:

 Classname Objectname
 -D subtag1
    -------
 -D subtag3
    -------
There is no need to mention the tag which contains the subtags.

If, however, you want to get rid of the contents of all the subtags in a tag, rather than saying -D subtag for each of the subtags, you could have the simpler equivalent, "-D tag". e.g. in the case of Author objects you can have -D Address, rather than:

 -D Mail
    ----
 -D E_mail
    ------
 -D Phone
    -----
 -D Fax
    ---
Adding comments

Comments can be added to any tag entry for an object. These comments will be added into ACEDB.

In an Ace file the syntax is:

 Tag TagEntry -C Comment
e.g. we could have the following for a Paper object, [cgc12]:

 Paper [cgc12]
 Author "Jones A" -C "and the rest of the gang"
 ------
 Page 456 467 -C "C. elegans growth patterns" 
 ----
In ACEDB comments are displayed with a dark background.

Ace file data that ACEDB does not see

If you want to put something in an Ace file which you don't want ACEDB to try to interpret, such as a description of the contents of the Ace file, prefix your note with "//". Anything following this to the end of the line will be ignored.

General Syntax Rules

Objects listed in the Ace files should be separated by a blank line, as in the case of the Paper objects below.

 Paper [cgc12]
 Journal Nature
 -------
 Page 10 16
 ----
 
 Paper [cgc13]
 Journal Nature
 -------
 Page 17 24
 ----
A data item should be enclosed in quotes if it contains a space. Otherwise the data after the space will be lost. The following Author object is an example:

 Author "Jones A"
 Mail "6 Blackheath Park"
 -----
 Phone "044 656565" 
 -----
If a tag entry has multiple parts there should only be quotes around individual parts e.g. the Sequence model includes:

 promoter Int Int ?Text
 --------
In this case you might have the following:

 promoter 2520 2526 "TATA signal"
 -------
but you should not have:

 promoter "2520 2526 TATA signal" 
 -------
Sentences in quotes can spread over several lines, as long as no carriage returns are included.

LongText and DNA

Some ACEDB classes have an array structure rather than the standard tree structure. Objects in these classes need special treatment in an ace file.

The LongText array class objects are long pieces of text (usually abstracts). The syntax is as follows:

LongText [wbg11.1p68]
  This is an intricate worm family saga spanning several generations. 

  It can contain blank lines, because there is a special symbol for the 
end of the entry.
***LongTextEnd***
DNA objects are pieces of dna. The syntax is as follows:

DNA CEMSA02.f
TCGTTAAGAATTGGAAGTTCCGATGTTAGTGAAAATGAGA
AGAAGGAGCTGAAGAAGAGAAAGCTTATCAGTGAAGTAAA
CATCAAAGCATTGGTGGTTTCCAAGGGAACATCTTTCACC
ACTAGTCTTGCAAAGCAGGAAGCTGATTTGACTCCGGAAA
TGATTGCTTCTGGTTCATGGAAAGACATGCAATTCAAAAA
GTATAATTTCGATTCACTCGGAGTTGTTCCGTCATCTGGG
CATCTGCATCCATTAATGAAAGTGCGGTCTGAATTCCGAC
AAATCTTCTTCTCAATGGGATTTTCTGAAATGGCGACAAA
TCGATACGTGGAGTCGTCTTTCTGGAACTTTGATGCCCTT
TTCCAACCTCAACAGCATCCTGCAAGAGATGCTCATGATA
CTTTCTTCGGTTCTGATCCCGCGATTAGCACGAGTTCCCTG
Everything is taken up to the next blank line. Letters can either be upper or lower case, and should be IUPAC codes for nucleotides, e.g. A, C, G, T, N for anything, R for purine...

DNA and LongText objects should be attached to other objects. The model for Paper objects includes the following:

Abstract ?LongText
A Paper and its related LongText abstract might be listed in an ace file as follows:

Paper [wbg11.1p68]
Abstract [wbg11.1p68]
Author "Jones J"
Author "Smith T"

LongText [wbg11.1p68]
This is the full text of a fascinating article by Leon Avery on
nuclear protein extracts.  The only important thing about it is that
the final line is as follows, exactly, without any spaces after the
stars at the end.
***LongTextEnd***
The Paper and LongText objects are usually given the same name.

Sequence objects include the following:

DNA ?DNA
A sequence and its DNA might be listed in an ace file as follows:

Sequence CEMSA02.f
From_Author "Kerlavage AR
Library Genbank ? M79466
Reference [cgc1567]
DNA CEMSA02.f
Related_sequence CEMSA02.r
Brief_Identification "Phenylalanyl-tRNA synthetase beta

DNA CEMSA02.f
TCGTTAAGAATTGGAAGTTCCGATGTTAGTGAAAATGAGA
AGAAGGAGCTGAAGAAGAGAAAGCTTATCAGTGAAGTAAA
CATCAAAGCATTGGTGGTTTCCAAGGGAACATCTTTCACC
ACTAGTCTTGCAAAGCAGGAAGCTGATTTGACTCCGGAAA
TGATTGCTTCTGGTTCATGGAAAGACATGCAATTCAAAAA
GTATAATTTCGATTCACTCGGAGTTGTTCCGTCATCTGGG
CATCTGCATCCATTAATGAAAGTGCGGTCTGAATTCCGAC
AAATCTTCTTCTCAATGGGATTTTCTGAAATGGCGACAAA
TCGATACGTGGAGTCGTCTTTCTGGAACTTTGATGCCCTT
TTCCAACCTCAACAGCATCCTGCAAGAGATGCTCATGATA
CTTTCTTCGGTTCTGATCCCGCGATTAGCACGAGTTCCCTG
The Sequence and DNA objects are usually given the same name.

KeySets

The other main array class is the KeySet class. The syntax for adding a KeySet via an ace file is as follows:

KeySet nameofkeyset
Classname Objectname    // repeated up to next blank line
The following is an example of an ace file keyset:

KeySet GenesfromBill
Locus unc-32
Locus lin-19
Locus dpy-40
Locus cll-3

KeySet DatafromBen
Locus unc-31
Clone C05G2