acedb --- A C. elegans Database
I. Users' guide

Richard Durbin
MRC Laboratory for Molecular Biology
Hills Road, Cambridge, CB2 2QH, UK
Email: rd@lmba.cam.ac.uk

Jean Thierry-Mieg
CNRS--CRBM
Route de Mende, BP 5051, 34033, Montpellier, France
Email: mieg@crbm1.cnusc.fr

Contents

Introduction

Acedb is the database system that we are writing to meet the needs of the nematode genome project. It is graphic, flexible and portable. As of february 1992, it has been tested on various Unix workstations (SUN, DEC, NEXT, SGI ...), on all sorts of X terminals and under several different windowing systems (X11, Sunview, Mac).

The system contains its own portable graphic library, an original object oriented database manager, a series of applications and a set of configuration files corresponding to the nematode data.

The present users' guide explains in general lines our approach to the data, their organisation in classes and objects, the way to search the database and the various tools provided so far to manipulate the genetic and physical maps and the DNA. We hope that it is self contained and sufficiently clear and complete to prove useful to everybody.

The installation guide explains how to set up the system and grant read and write access to the users. It should be read by the person installing Acedb on a new machine, preferably someone who knows Unix.

The configuration guide is necessary if you wish to parse your own data in Acedb, either in complement to the official C.elegans data or for a different organism or a different subject. It is self contained.

At last, the programmers guide describes our graphic and database libraries and is intented to help you program new applications, may be new tools for classical and quantum genetics, may be code to deal with a completely different field. Note that part of our tool box is even useful to write C programs who do not even need a database capability.

Although we provide the source to the whole system, we strongly recommand that you do not touch the kernel without prior discussion with us because we do not want the system to fork into a collection of mutually incompatible versions. However, any comments on anything, documentation, program, data, would be very welcome. In particular we would like a wish list of program features (no promises we will satisfy your wishes, but unless you tell us we can't). The best way to send us comments is by email -- our addresses are on the title page.

Basic Concepts

The first principle of the program is that any piece of data stored in acedb can very easily be exported in flat ascii files to be used by other programs. We provide several standard formats, the FASTA format specific to DNA sequences, and two general purpose formats: our own edition language (ace files) and the possibility to export everything in the ASN-1 format recommanded by the NCBI (Bethesda). As of release 1_7, we use our own ASN-1 definition files but hope to be able in the near future to export those data that are common to them and us using the NCBI definition-file.

The second principle of the program is that write acces to the database is organised in macro transactions that we call Sessions. Until you explicitly make a global save of a session from the main menu, and irrespectivelly of what you have been doing, if you either crash for any reason or explicitely abort and leave the program, you will revert to the statu quo ante and not corrupt your data.

All the information is stored in objects, which fall into a number of classes. Each object belongs to just one class, and has a unique name in that class. The classes are standard units such as genes, alleles, strains, clones, papers, authors, journals, etc., and the names are in most cases the standard names. What can be stored in an object and how it is displayed and used is governed by the class. The idea is to make all the information about an object reachable from that object. Much of it is in other objects, so objects often contain pointers to other objects. They can also contain basic data, such as numbers (e.g. map distances) and text (e.g. titles of papers).

The information in each object is stored hierarchically, in a structure called a tree. The branches of the tree contain labels, or tags, describing the information, which is at the leaves of the tree (in general, though things can get a bit more complicated). The advantage of a tree is that it is very flexible: it can be extended arbitrarily far in any direction as more information is gathered about some particular aspect of the object. In general when you look at an object, such as a gene, a window pops up showing you the tree of data, with the tags. An example is the following incomplete tree for ced-4.

ced-4  Reference_Allele n1162
       Clone MT#JAL1
       Map Genetic gMap III -2.7
                   Mapping_data 2Point ced-4/unc-32
       Allele n1162
       Strain ced-4(n1162)III
       Reference The genetic control of cell lineage etc.
                 Genetic control of programmed cell death etc.
The tags are all shown here in a sans serif font, and the pointers to other objects in italics. You can follow a pointer by double clicking on it, in which case its tree of information will be displayed. The tree structure allows for more alleles, strains, mapping data etc. to be added, in the same way that there are two references. Also, other types of information can be added, such as pointers to the sequence, related genes etc. Attached to each class is a list of possible information that can be added, together with the relevant tags and where to put the information in the hierarchy. This information is called the model. For further details see the configuration guide. The model itself is structured as a tree and stored as an object with name ``?Classname'' (so ?Gene is the model for the gene class), so you can display it like any other object. An important feature of our design is that the model for an object can be expanded throughout the lifetime of the system, to allow for information about new features to be added to the database without having to start again. It is also possible to add a comment anywhere in a tree.

Apart from the tree objects, such as genes and clones, we also store efficiently in our database tables of records of variable length, i.e. tuples. This allows us to store more extended things, such as genetic or physical maps as lists of positions of, and pointers to, the standard discrete objects. The user normally accesses them indirectly via scrollable graphical displays, and so does not have to know about their internal structure, but they greatly simplify the writing of new applications.

Is acedb object-oriented?

A major current vogue in computer languages and database design is for ``object-oriented'' systems. It's also a source of lots of argument. We are just trying to build a good system, and don't want to get caught in the crossfire, but... dots we do talk about organising our data into objects and classes. We have undoubtedly been influenced by many of the ideas going around, but it isn't likely our system would be regarded as kosher by the object-oriented community. In particular there is no class hierarchy, nor inheritance, and it is written in a modular but non-ideological way in straight C. However display and disk storage methods are class dependent.

In some ways the class hierarchy is replaced by our system of models and trees, which seems to be rather unusual. We think it is very natural for the representation of biological information, where for some members of a class a lot might be known about some aspect, but for most only a little is known.

The advantages of our sytem over a relational database, such as Oracle or Sybase, is our ability to refine our descriptions without rebuilding the database and the possibility of organising the storage of data on disk according to their class, i.e. we store in a very different way the tree-objects and the long stretches of DNA sequence.

First contact

This chapter explains the way you can interact with this program and what you can expect of it. It is relevant to every user of our database. Installation, configuration and recompilation of the software are treated in later more technical chapters.

First comes a brief section saying how to start and stop the program. Then there is a series of subsections organised around the various different types of windows that come up. It is probably best to read this chapter with a running version of the program in front of you, so you can try things out. Alternatively, you can just rush in clicking all over the place, then come back to this guide for detailed assistance.

Acedb is avalaible in several version. It runs under Sunview, the now obsolete Sun proprietary windowing system, under staight X11, and is thus compatible with any X window manager (twm, motif, openwindows etc.), and in line mode, the executables are respectively called sunace, xace and tace.

Prerequisites

This section specifies what you need set up before the program will run. It is a little technical, but may not be necessary if someone else has already set you up correctly.

To run Acedb, the program must be correctly installed on your machine. In particular, you need the environment variable ACEDB to be set to the correct directory (the parent directory of the database) and the executables (sf xace, tace, sunace) to be in your path. As explained in the installation manual, these things can be set up automatically by editing your .login file or by setting a correct link in /usr/local/bin.

The first time you run Acedb, you must load the data using the Update entry of the main menu.

If you are running on a monochrome machine then also set the environment variable MONO (by "setenv MONO"). The monochrome version uses stippling for coloured backgrounds, which is not entirely satisfactory, but is better than nothing. This manual assumes you are using a colour workstation, which is strongly recommended.

Starting up, getting the menu and exiting

To start up, type acedb (or perhaps bin/xace & [ bin/sunace &]) from within a shell (terminal) window. The system will come up in read only mode and you should get a pair of windows entitled ``Main Window'' and ``Selection List'' appearing in the top left corner of the screen [or where you click depending on your window manager], with the visible classes listed in the Main Window and the Chromosomes listed in the second one.

Pressing the right mouse button inside a window brings down a menu from which you can select options in the usual way: you keep the button pressed and move the mouse until the desired option is highlighted, and then release.

All the acedb menus have a ``Quit'' option at the top of them. This kills the window, except in the case of the main window, where it kills the program.

Note that, if you have been updating the database, and you kill the process or exit brutally, the database will automatically revert to the statu quo ante. i.e. you will lose all the updates done after the last Save but no more. If you select Quit after changing data but before Save then you will be asked whether you want to keep the changes you have made.

On line Help: F1 or F10

If you point the mouse to an acedb window and press the Help key, or the F1 or the F10 function key (depending on your machine and window manager), a new window will pop up with the relevant section of the online help manual. Alternatively, nearly every window has a ``Help'' option as the second entry in its menu, and some windows have an explicit ``Help'' button. These 3 ways to invoke the help are equivalent. [Note that some X window managers appropriate the F1 and F2 buttons, which is why we also map Help and Print functions to F10 and F9].

The help window itself has a menu starting with ``Quit'', to destroy the help window, ``Help'' to gain help on the help, ``Index'' to get the table of contents of the help file, ``Page up'' ``Page down'' to browse through the help file sequentially, ``Push'' to mark a page as interesting, ``Pop'' to circle through the marked pages. The item active when you invoke the Help is automatically pushed in as interesting. The ring of marked pages is only destroyed when you ``Quit'' the help window.

As a complement to the present manual, you can print out the on-line help file. It is called wmake/help.wrm. However this is a pure ascii file (not TEX like this manual) meant to be read one page at a time rather than as a whole.

Print out: F2 or F9

If you point the mouse to an acedb window and press the F2 or F9 function key, you will generate a print out of the window. Alternatively, nearly every other window offers the option ``Print'' as one of the entries of the menu, some have an explicit ``Print'' button. Each time you select ``Print'', the program makes a postscript file in the subdirectory $ACEDB/PS. It will then try to print the file using the command ``lpr''. If the environment variable ACEDB_LPR is set, (for example : setenv ACEDB_LPR 'lpr -Plw3') the system will use this command instead of plain ``lpr''. In particular, if you set it to an empty string then the file will not be printed, but instead will lie remain in the directory. In any case you will be told the file name and how many pages were output.

We have to use PostScript because our outputs frequently contains graphics and drawings. The actual postscript files only contain standard ASCII characters corresponding to the data and the drawing commands. It is therefore possible to copy them or e-mail them without difficulty, however PostScript is a little involved so it is unwise to edit them by hand. Remember that the print out corresponds to the full window, not just that part visible on your screen (important for the physical map and symbolic sequence displays).

NB: We have had some trouble with laserprinters accepting acedb .PS files. It had to do with the UNIX /etc/printcap file. In case of problems try to directly cat the files to the proper port (something like cat postscript_file > /dev/ttya). Another problem that has occurred is that the printer rejected the first line of the file. This was cured by removing the ``!PS-Adobe-1.0'' from the first line (the same command is needed on some printers!).

Window control

Window control under X11

Acedb generates lots of windows on the screen that you can move, resize, iconize etc. The ways that you do these things are determined by the window system, not acedb. It is therefore not really practicable to detailled instructions in all cases. We recommand X11. This system lets you run on the network, and available on all sorts of platforms. Because it is customizable in a wide variety of way, you should ask ask a local expert about the system that (s)he set up for you. Sun sparc staions come equipped with openwindows, Sun's X-server. It is worth noting that the MIT X-server, which is free, runs at least 5 times faster than openwindows release 1.

Acedb runs equally well under twm, motif, dec-window manager and several other X11 window managers that we have tested. Except under Sunview, the menus are available as usual, by pressing the right mouse button while the cursor is in an acedb window, dragging to the chosen option and releasing.

In the next section, we give below basic instructions for Sunview, the older Sun proprietayr window manager. This section includes some acedb specfic information, so its worth looking at even if you know how to use your windowing system.

Window control in Sunview

Because of a lot of SunView mess we can't sort out, we can't make the official arrow keys work (R8, R10, R12, R14). So we use R1 for left, R2 for up, R3 for right and R5 for down. Furthermore, under Sunview, to access the acedb menus, you must press the right mouse button on the title bar,rather than inside the acedb windows. However, the original frame menu is still available by walking right from ``Frame Rightarrow'' at the bottom of the list, but most of what it allowed you to do is available by mouse clicks and dragging as described below.

Sunview windows are dragged, iconized, resized etc by pressing the mouse buttons while the cursor is on the title bar or thin outer frame of the window. All the references to mouse clicks in this section assume that the cursor is on the title bar or frame. Alternatively, you can use the left hand keypad (L1 though L10).

SunView windows can be moved by pressing the middle mouse button (on the title bar or frame) and dragging (press near corners for 2D dragging). Holding the ``Control' ' key down and dragging the middle button resizes the window. Acedb windows resize properly (even usefully).

Clicking with the left button brings a window to the top of the heap , as does the ``Front'' key (L5) whenever the mouse is inside the window. If the the window is already at the front, the ``Front'' key puts it at the back. ``Control'' plus the left button extends the window to full screen in the vertical direction, with a second application restoring it to its original size and position. This is particularly nice for the genetic map.

Scroll bars

Text and physical map windows have standard scrollbars (the genetic map uses its own non-standard ones; see below). It is simplest just to experiment to find out how they work. Under sunview, the middle button takes the slider to the cursor position. The left and right buttons move you right and left (down and up). How much they move you depends on where the cursor is in the scrollbar when you press them. If it is full right/down a whole page is moved, if left/up then not very much. The little arrow boxes at the ends of the scrollbar are meant to move you one unit. They move one line in text windows, and do nothing in the physical map window. Unfortunately, scrolling sometimes screws up our sunview displays. To clean up the mess, push the window behind another one and pop it back. As odf release 1_7, a similar problem occurs on Suns under openwindow-2. Sorry.

Fonts

Acedb uses the standard mit-fonts 8x13, 9x15 etc. These fonts are not directly available on some machines. This is so in particular on the Dec-stations. Normally, acedb complains if it cannot find some fonts, but on the Dec, the complaint message has sometimes been missing. To cure this problem, you must make these fonts available to the window manager. See the wdoc/NOTES file for a hint.

If you have a font problem, most often it will show when you run on the console of your machine but not if you run from an X terminal. So you will know how to cure it by carefully studying the X-terminal configuration files.

System interface

Mouse, messages and text entries.

The Acedb Graphic User Interface is controlled by the mouse. The left button is used to pick windows or objects. Usually, picking once changes the color of the box, picking a second time initiates and action. The mid button is used in some graphic windows to recenter (single click) or to zoom and recenter (by dragging in horizontal and vertical directions). On the Next, press both buttons to emulate the mid button. The right button is used to access menus.

Just picking objects lets you browse in a natural way through the database, but many more facilities are available from the menus. Each acedb window has its own menu. Under X windows (Openwin) you get it by pressing the right mouse button when the cursor is in the window. Under SunView you must press the right mouse button when the cursor is in the title bar of the window.

In several windows, you will see yellow and green boxes. These are micro text editors. Only one of them is active (yellow) in a given window. To activate one, pick it with the left mouse button, this will turn it yellow and position the cursor. You can then type in some text endding it with the Return key or a number of traditionnal short cuts. The recognised commands are: Return key, Insert key (a toggle), Delete and BackSpace keys, Home and End key, the left and right arrows, a set of usual control keys to move (ctrl-f: forward, ctrl-b: backward, ctrl-a: to beginning, ctrl-e: to end of line) and delete (ctrl-d: a char, ctrl-w: a word, ctrl-k: to end of line, ctrl-u: whole line) and Tab which autocompletes the names when the class is known.

Sometimes acedb puts up blocking windows that require you to respond before you can do anything else, e.g. for questions, confirmations and messages.

Within windows we often make use of arrow keys. [The correct arrow keys work in the X version, R1-5 under Sunview.] If you want to use keys for a window the cursor must be inside the borders of the window [true for X as well].

Main window

This window lets you select objects by name or text content. You can also issue, from the menu, global directives such as quitting the whole program, performing a general dump or activating an application.

The user interface in every other window is modelled on this one. As explained earlier, the menu is obtained by pressing the right mouse button on the title bar of the window [right mouse button inside the window for X]. The menu options for the main window are described below. However, the standard interaction with the system is by clicking with the left mouse button the items displayed in the window. As you will soon realize, in this system the majority of the the displayed items can be picked for further exploration. Since this is the standard, pickable items are not highlighted in a particular way, but they change color on the first mouse click (left mouse buton) and respond on the second click. There is no time limit on the interval between the two clicks, so you do not have to ``double click'' all in one action.

The main window contains a list of available classes and two special input-fields: template and text search. If you type some text while the mouse pointer is inside the main window, the input will be caught by the active input-field. You select the input-field you wish to write to by picking it once with the mouse. It then turns green. If you pick it again or hit the return key, you initiate an action.

Pick a class by double clicking on its name. The ``Selection window'' will show the members of this class. You can scrollasrs the selection window up and down with the page buttons, and pick objects by double clicking on them. Once an object is picked you can move to neighbouring objects in the list by using, with the mouse pointer inside the selection window, the arrow keys (remember these are R1,2,3,5 in SunView). The right and left arrows wrap correctly. The menu of the selection window offers many options described below.

If the selection list is too long, you can restrict to a shorter list by typing something into the template input-field (the mouse must be in the Main Window) and finishing with return or clicking the template box. Allowed wildcards in the template string are `*', for anything of any length, `?' for a single character, and `#' for a number (series of digits). For example it is often useful to type the first character or two of a name, then '*', then return to bring the desired name near the head of the list. The name matching is not case sensitive. Also note that it does no harm to leave the initial `*' that the system originally puts there. If only one object name is found matching your template, this object will be displayed automatically, without waiting for you to click on it. So if you know the name of the object you want you can just select its class and type its name, followed by Return.

Objects are normally shown in their default display mode. For most this is in their basic tree form. Clones are displayed on the physical map, genes and chromosomes are displayed as a genetic map.

Often when you display objects an old window is reused. Currently the rules for this are that tree windows are only reused if you haven't done anything in them (e.g. selected something), while map windows are always reused unless you preserve them with the ``Preserve'' option from their menu.

If you pick the Text search box and type in at least 3 characters, you will obtain all the objects that contain text matching what you type. (In fact this is a little more complicated: only text in the ?Text, ?UserComment or ?Keyword entries is searched, but this is (we think) what is wanted.) In particular, by typing ``actin'' you get all papers with the word ``actin'' in their title, all the sequences that refer to actin in their remarks, and so on.

Main window menu

Quit
Quits the whole program. It prompts you if you have altered the data and not yet saved your changes permanently.
Help
activates the first page of the on line Help.
Clean Up
Destroys every acedb window except the ``Main Window''. It is a useful option when the screen is clogged up. If a window has been updated and not yet saved, you will be prompted during the clean up cycle. However, remember that these new data are just considered in the ongoing session and will be stored to disk permanently only if you perform a ``Save'' from the main menu.
Program status
gives information on the status of the data base. Select its own on line help to have an up to date (we hope) detailed explanation. It monitors disk and memory usage.
Query
starts a window controlling the query facility, which is a more complicated way to search for items in the database. It has its own chapter below.
DNA and Deficiencies
start application windows that allow various sorts of processing and analysis. Currently they are both in embryonic form.
Add Update File
starts a window that controls adding official updates to the database.
Read models
is needed when you wish to augment the number of classes or modify the description of the classes. You need to own the database to be allowed to select this option and you should be very careful, see chapter Configuration.
WCS annotate
(only if corunning with the Arizona group's Worm Community System) allows you to initiate an annotation in WCS for an acedb object. This is one way to provide feedback to us, though any form of direct communication will be faster.
WCS show
(also only if corunning with WCS) asks WCS to show anything it can about an object in acedb (e.g. a paper or gene).
Frame
(in the SunView version) hides the standard Sun View window manager menu. It is not very useful since it is much easier to use the left key pad and the mouse to move and resize Sun View windows.

Selection List, and KeySets in general

The basic operations for the Selection List window have been described above in the Main Window section. The remainder of this section describes more advanced features that can be skipped on a first reading.

The Selection List window is in fact a special example of a general KeySet display window, that lets you look at and manipulate a list of objects. The keysets can be combined in various ways (union, intersection etc.) and are the basis for the general Query mechanism described in its own section below.

Each keyset says how many items are contained in it on the top line. If you want an item further down in the list, then either use the page-up page-down buttons at the two ends of the window or restrict the size of the keyset with the Template option of the main window, or with a Query command.

At any time one (or no) keyset is the selected keyset, indicated by the highlighted box in its top line. You can select a keyset by clicking the left button anywhere in the window. The selected keyset is the one that Query acts on, and also the one that is combined with the current keyset when the AND, OR and XOR selections are chosen from a keyset window.

menu

Quit, Help
as usual.
Add key
lets you pick key one by one and add them in the active key set.
Copy
creates a new keyset window with a copy of the curent keyset in it.This is very useful before the AND, OR, XOR operations or to save some intermediate result.
Save
saves the keyset permanently in the database. You must have Write Access to do this. It prompts for a name, which can then be used for retrieving the keyset via the the KeySet class option in the Main Window.
AND, OR, XOR
combines the current keyset with the selected keyset using the chosen operation. AND is intersection, OR is union, XOR is symmetric difference.
Ace Dump
prompts for a file name, then writes into it all the objects in the keyset in .ace format. Useful for transfering information to other systems.
Name Dump
prompts for a file name, then writes into it all the names of the objects in the keyset in .ace format, not their content.
FastA Dump
prompts for a file name, then writes into it all the sequences attached to the objects in the keyset, in FastA format. This will only work for sequence objects that actually have DNA sequence in acedb (in the canonical database only worm sequences). There are also sequence objects representing foreign sequences in the standard databases (EMBL, GenBank etc.) which only store accession numbers and identifiers.
Tree
displays the currently selected item in tree form if possible. This is sometimes useful to override default display as a map (e.g. for a clone).
Biblio
creates a Biblio window (see below), listing all the references referred to from items in the current keyset.

Basic tree objects

You can scroll these vertically. If information is missing off the right hand side then resize the window to widen it. If you double click on any object name in the tree, it will be selected and displayed. The first click selects (highlights) and the second click displays. Again, the arrow keys move you around the tree rapidly (especially useful for going down a list, e.g. of papers).

Menu

Quit, Help, Print
as usual.
Update
if you have Write Access, enables you to enter comments anywhere in the object, and add or alter the data. How this works is only described in the second manual for database curators and programmers, since the standard user can't get write access.
Biblio
Gives a list of all the papers referred to in this object, with their titles and authors in standard bibliographic format.
Graph
may display the object in map form, if this can be done.

Update window

This window only has one functional button, to

**Update
The one functional button is "Next Update", which adds the next official update. This should be placed in ACEDB_DATA, or if that is not defined, in ACEDB/rawdata. In order to be allowed to use this button your username must appear in the file wspec/passwd.wrm, and you must of course have write access to the database directory.

A record of what is going on will appear in the window. If the system runs out of space in its disk file it will ask you for more. You can add several updates in a row if you have them. If there are any problems then you will be thrown out of the whole program, with an error message.

Query language

Query window

You create the Query window by selecting the ``Query'' option from either the Main window or a keyset window (see above under Selection window). It is a powerful tool to let you perform combinatorial requests on the database. For example you can get all the genes referred to in all the papers written by Bob Waterston, intersected with those from John Sulston, etc.

The basic operation is to apply a search command to each item in the current selected keyset (the section above on the Selection window describes what this is). The search commands are specified on lines in the ``Commands:'' region at the bottom of the window. The command lines are in text entry boxes. You can activate a box from grey to green by single clicking. Clicking on it again will issue its command. Before that you can type into the box using a simple line editor. Hitting the Return key also issues the command, like a second mouse click. The syntax of the commands is very general, and hence unavoidably a little complicated. It is described in detail below.

As well as typing in commands, it is possible to read a set of commands from a file. These will come up in a set of grey command boxes, and can either be issued directly as they are, or edited first. To load a file of commands, choose the Load button. This creates a new ``File chooser'' graph. The name of the directory is in the top box. The file ending with .qry are listed underneath. Just double click one of them to load it. Or type in some text in the directory green box to change directory. The directory ACEDB/wquery contains a number of files of example commands.

Note: It still may be possible to come up with commands that crash the query command parser. It would be useful for us to have a detailed list of any of these you find.

Menu

The menu is repeated in a set of buttons directly visible on the window.

Quit, Help
as normal.
Undo
This returns the selected keyset to its state prior to the last command. You can only undo once.
Load
loads the named file of commands from the named directory.
Save
saves the current set of commmands in the named file in the named directory. ``.qry'' is appended to the name automatically.
New
creates a new empty keyset window and selects it.

Query command syntax

There are three allowed types of command:

>?ClassName xxx
The expression xxx is applied to all the objects of the Class.
>Tag xxx
The expression xxx is applied to all the objects following tag in the objects of the active key set.
xxx
The command xxx is applied directly to all the objects in the active key set.
When an expression is applied to an object it either returns TRUE, in which case the object is add to the new KeySet, or FALSE, in which case it is discarded. An expression is built up by logical operations on basic expressions.

There are two forms of basic expression: the first is just a string, which is matched to the names of the objects. The string can contain the wildcard characters allowed in the Main window Template: (* for any text, ? for a single character, and # for a number). If you want to include spaces or any of the special characters &, |, ^ , -, ., <, >, =, ( or ) in a string then you must enclose it in quotes. For example

>?Author s*
will give all authors whose names start with `s', while

>Author s*
will give all the authors with names starting with `s' who were authors of some item in the current key set.

The other basic expression is just to give a tag name. This would select out only those objects containing that tag. However most tags are followed by data, so we also allow tests to be performed on the data. The syntax for this is ``Tag op value'' where for strings op must be `=' and value is a string, possibly containing wildcards, while for numbers value is a number and op can be any of `<', `<=',`=', `>=' or `>'. Thus

>?Paper Author = s*
would give all papers with authors whose names begin with s, while

>?Paper Year $>$ 1980
selects all papers published after 1980.

The basic expressions can be combined using the logical operators AND, OR, XOR and NOT. These can be abbreviated by '&' for AND, '|' for OR, '^' for XOR and '!' for not (the symbols used in the programming language C). They take the normal order of precedence (OR, XOR, AND, NOT from lowest to highest), but you can use parentheses around any subexpression to change this. Examples of compound expressions are

>?Author s* | a*       All authors whose name begins with s or a
>?Paper  Journal = N* AND Year > 1987
                       All papers published in N(ature) after 1987
>?Gen* myo* & Clone    All the cloned myosin genes
There is one additional complication: often there are values of interest not directly next to the tag, such as the genetic map position. To test on the value of these subfields you must first locate yourself on the appropriate tag then move right from it using the pseudotag NEXT. Similarly, you can test the same value more than once by using the pseudotag HERE. For example the genes to the end of chromosome X are found with

>?Gene gMap = X & NEXT > 12 & HERE < 19
This is unfortunately a little clumsy. Also it will be evaluated correctly from left to right only if the precedence of the operators allows it. This can be forced using parentheses. Be careful.

!!! N.B. There is currently a bug in the parser that means that NOT does not work. We hope to fix it soon. ?Is this fixed now?

more

Loads and execute external programs The idea is to prepare in a directory ./wquery a set of files named xxx.qry containing sets of useful commands.

The interface in this window is a little complex but it offers a great processing power. You can use the menu or act directly in the diplay.

The menu has the following options :

Quit:
Quits this window with an option to save your current file of commands, provided you have given a file name.
Help:
Invokes this section of the help file.
Load:
Loads the file Directory/FileName.qry if it exists. The ending .qry is added automatically
Save:
Saves the present set of commands into the file Directory/FileName.qry if you have write access to Directory. The ending .qry is added automatically
New:
Open a new keyset window.
Undo:
Cancels the last command and redisplays the previous keyset.
Commands:
You activate a command line by picking it. It turns yellow and start to catch the keyboard. You execute it by touching again the left mouse key ot the return key. You edit it with the key board. If you edit the last command line a new empty one is created.
Scope:
The commands act on the active key set. To select another key set, pick it once or use the New button.

**Query_syntax of the commands

There are 3 possibilities

>?ClassName xxxxxx
The command xxx is applied to all the keys of the Class
>Tag xxxx
The command xxx is applied to all the keys following tag in the objects of the active key set.
xxxx
The command xxx is applied directly to the active key set.
A command is a logical expression evaluating to True or False. it is applied to a key which is discarded or retained in the resulting list.

A composite query can be formed by chaining a series of queries separated by semicolons, in which case the active keyset obtained so far is passed into the next query.

Recognised operators are, in order of increasing precedence:

	|    OR
	^    XOR
	&    AND
	!    NOT
        <    <=    >    >=  to compare numbers
	=    to compare numbers or match the left hand side to a
	     template, i.e  a word with wild chars *, ?.
Parentheses of all sorts [, \, {, ( can be used freely, but must be matched.

Words are matched to a tag or treated just as text. They must be put in "double quotes" if they include spaces or any operator &, |, ^, <, >, =, (, ), [, ], {, }.

Wild chars can be used in words but then they are not at present matched to tags except in a redirection Numbers are parsed as floats.

Examples (as given in $ACEDB/wquery/examples.qry)

 >?Chrom*               will list all chromosomes 
 >?Author s* | a*       all authors whose name begins with s or a
 >?Au*  s?s* OR b*s*    all authors whose name matches s?s* or b*s*
 >Pap*  Journal = N* OR Year > 1987 

        Checks the previous list : redirect to papers and lists
	all their papers published in N(ature) or after 1987

 Year = 1988            Restricts to the papers of exactly 1987 
 >?Gen* myo* Clone      All the cloned myosin genes
 >?Author "Sulston JE" ; >Paper ; >Author

			Finds all the coauthors of papers by Sulston.

Subfields

To find the value of subfields you must first locate yourself on the tag then move right from it using NEXT (move right) or HERE (move here) for multiple checks. For example the genes to the end of chromosome X are found with

>?Gene gMap = X AND NEXT > 12 AND HERE < 19 ;
Warning: This is clearly hard to read, also it will be evaluated at execution left to right as it should only if the precedence of the operators allows it. Be careful.

Subtypes

To locate a tag in a subtytpe, you must indicate the path using the operator #, for example

>?Gene Lab AND NEXT #freezer = 4
To find all the genes in your freezer number 4

Biological applications

So far, we have decribed the interface to the database kernel, irrespectivelly of the type of data that is stored. We now come to an overview of the various applications written more specifically to handle classical and quantum genetical data.

Physical map

This displays a contig in a horizontally scrollable window. Its design is based on the VAX PMAP display, with some changes (improvements?).

The YAC's are now all together above the fingerprinted clones. As in PMAP, clones that are canonical for others that have been ``buried'' have an asterisk after their names. Gridded clones (or clones canonical for gridded clones) are written with a thicker line. For cosmids these are the grids probed with the YAC's that determine the YAC positions on the map. By convention YAC endpoints are placed in the centre of the last clone that they hybridise to; you should be able to see what the possible range of the true endpoint is by inspection of the thick-line cosmids. The thickened YAC's are those on the ``polytene'' filters used for mapping new clones by hybridisation.

At the very top is a line containing clones assigned by hybridisation to the YAC filters. These assignments are not definitive: in some cases a clone will hybridise at several places in the genome (presumably because of sequence similarities). You can see this together with any comments by double clicking on the clone.

Where there are YAC bridges across gaps in the fingerprinted clone contiguity we have introduced a double line ``break'' symbol to register that the intervening distance is not known (or at least, is nominal in the display). Most of the YACS now have length entries, under Gel_length in their tree representation.

Below the clones is a line for sequence (almost all empty just now), then three lines contaning genes, and finally the remarks below that. Some remarks thought to be of almost no use to anyone except Alan are suppressed from the standard display (this was done crudely - there are plenty left that fall into the same category). They can all be shown by choosing the ``Show All Remarks'' command from the menu. The the remarks (and gene-names) to do with one clone are left-aligned with the clone's name (unless you are out of space and they are ``bumped'').

You normally get to the physical map by picking a clone, in which case you come up with that clone highlighted. If you pick a clone from within a physical map it comes up as a tree, showing all its information. If you pick a remark in a physical map you display the relevant clone displayed as a tree. In particular this lets you see any further remarks. If you pick a gene you get the full gene as a tree. One extra subtlety: if you are looking at a clone in tree form, and it is canonical for another clone, whose name you pick, you get that clone also as a tree. There has to be something like this to allow inspection of non-canonical clones.

When an object is selected, it highlights in blue, and all other objects attached to the same clone are subhighlighted with a pale blue background, e.g. remarks, genes, or clones if the selected object is a remark or gene. If the picked object is a clone assigned by hybrisation then the YAC's that it hybridised to are subhighlighted. Again the left and right arrow keys move along the display. In this case if a clone is currently selected the left arrow will move on to the next clone, if a gene then on to the next gene, etc.. Unfortunately the map does not scroll automatically, so the arrow keys can take the selection point off the visible section of the map. However you can always centre on the currently selected object using the menu ``Redraw'' command.

Resizing a physical map window keeps it fitting the full vertical extent, so if you make it long and skinny the text becomes smaller but you see more. We might change this -- it doesn't seem very useful, and the resulting automatic font size variation is annoying for the programmer.

Menu

Quit, Help
as usual.
Print
Note that this contains all the material in the whole scrollable display, not just what you see. For big contigs this is generally too much, and it is better to use the Left cut and Right cut items to restrict what is printed.
Left cut, Right cut
These allow you to set left and right limits to what can be scrolled over. They are basically useful for printing, but a shorter display also speeds up redrawing and scrolling. When you have selected a cut operation from the menu you must click somewhere in the display with the middle mouse button to specify where the limit will be. You can use the scroll bar before doing this.
Preserve
Marks this window so that the next physical map requested will come up in a new window, not overwrite this one.
Redraw
Redraws the window, centred on the currently selected object. Useful in three situations; (i) when after window manipulations, such as resizing and dragging, some of the window looks wrong (we try to deal with this automatically, but it can get confused), (ii) when the selected object is off screen, (iii) to redisplay the whole contig after Cut operations.
Recalculate
In order to speed up map drawing, an array of relevant information is calculated from all the clones in a contig. This is done automatically when a contig is first displayed, and then stored for future redisplay if the user has write access. If subsequently new information is added, and the display gets out of date, this option will recalculate the array. If you add data only via the Update command from the main menu then this is all taken care of.
Show All Remarks
As mentioned above, this shows the extra remarks on YAC hybridisation and PCR used in constructing the map. Applying it again removes them.

Genetic map

When you pick a chromosome name you get its genetic map. These windows behave well under resizing, which is necessary to get at information off the right hand side. Point genes are represented on the right, rearrangements such as duplications and deficiencies on the left. Genes that are also located on the physical map are highlighted by a yellow background.

Highlighting a rearrangement pops up its name in red near the top of the screen. Double clicking on a gene or rearrangement gives its Tree representation, with information on alleles (in the Stock Centre or MRC only, so far), papers, map data etc. The up and down arrows work.

The positioning of genes crudely reflects the accuracy with which their genetic map position is known; those closer to the line have smaller error values. In principle the sharp distinction of ``on the line'' versus ``off the line'' can now be abandoned, so everything has a mean position and an error (or range) estimate. It should be noted that gene names are bumped rightwards to prevent overlap, so the interpretation of accuracy from left/right position should be taken with a pinch of salt, especially at low magnification when things get crowded. When they are very crowded then they also get bumped down, so even the mena map position can be badly distorted. To see positions accurately you must zoom in. The philosophy is that if you are really interested, you should look at something closer to the mapping data (see about the ``Map Data'' button below).

You can change magnification with the ``Zoom'' buttons. ``Zoom out'' let you see a larger part of the map, while ``Zoom In'' enlarges the center of the display. Resizing the window also has interesting effects.

To move along the chromosome you have a choice between a coarse and a fine control. You can drag the slider on the bar at extreme left, which represents the current view with respect to the whole chromosome. Or you can drag the small dark rectangle on the scale line, which will recentre the display wherever you take it. This is much more useful at high magnification, and is also useful for centering on a gene at low magnification before zooming in on it.

The yellow bars with small bands to the left of the scale bar represent contigs, and provide a link between the genetic and physical maps. The long yellow bars are contigs. Currently their endpoints are drawn at the last mapped clone that they contain. This is very conservative but we are still discussing the best way to extrapolate. The bands represent the clones to which genes have been assigned. If you click on one of them you get a physical map centred at the appropriate point. If you click between bands then the system will interpolate for you.

The ``Map Data'' button gives mapping data for the currently selected object. This is most useful just now for Df/Dup map data. Selecting a rearrangement and pressing this button will show all the genes mapping inside the rearrangement boundaries (i.e. fail to complement for a Df) in green, all those outside in blue. Similarly, when applied to a gene it shows all the rearrangements that the gene maps inside in green, all those that it is outside in blue. Also for a gene the other genes that it has been mapped with respect to using 2-factor and 3-factor data are highlighted.

All the menu options are equivalent to the ones of the same name in the physical map.

Biblio window

This is generated by the Biblio option in some other window. It shows a list of all the references related to the contents of the window from which it was called. The references are displayed in bibliography style with the title, authors, journal, volume, pages and year.

The object names are also displayed at the left for each reference, and these can be picked by double clicking to display them in a standard tree window. The text entry ``Topic'' box at the top does nothing but enables you to give a title to your print out.

The menu just contains the standard Quit, Help and Print options.

Clone Grid windows

This display is designed to display hybridisation data from the YAC polytene filters (and other gridded filters), and provide a tool for people to go between such data and the map, even when they don't have write access to the database.

You see a set of squares layed out in the same pattern as the clones on the filter. To help orient you, the rectangle at left containing ``LABEL" corresponds to where the numerical label is written on the filter.

The current active box (after you have clicked once in a box) has a cross in it and is outlined in red. The name of the corresponding YAC is shown in the ``Gridded Clone" text entry box at the top. If you click on this to make it yellow, and type in the name of a YAC on the grid, that will be indicated with a cross/outlined in red. You can also click on the ``Gridded Clone" button, in which case you can pick on YAC's elsewhere, such as in a physical map window. This is useful for finding where neighbouring YAC's on the map are on the grid.

There are two modes: Edit Mode and Map Mode. The mode with the coloured box is current. You can switch by picking the other box.

Map Mode: when you click twice on a box you go to the physical map. If it is an empty box, then you will go to the corresponding YAC. If it is filled then it, and all the other filled boxes representing YACs that overlap, are highlighted in red, and you go to the average position on the map that they define. i.e. if a number of boxes are filled (positive) then map mode splits them into groups, each of which corresponds to one potential map locus.

The grouping is done by taking all YAC's within some range into a cluster. This range is 20 by default. It can be changed with the menu entry ``Set Cluster Range".

Edit mode: this lets you define a pattern. Click with the left mouse. One click in a box gives a dark blue fill, a second makes it pale blue (signifying a weak signal), and a third will clear it.

Probes: we now recognize two sorts of probes: simple clones, and pools. Pools can contain collections of clone probes and subpools. The hybridisation pattern for a pool is the union of the patterns for all its clones and subpools, together with any data attached specifically to itself. You can type the name of a probe into the text entry box (after activating it to yellow by clicking on it), or you can press the Probe button and pick your probe from another window (e.g. a phytsical map window).

The Clear button clears any current pattern, deletes a clone name if ones is present, puts you in edit mode, and prepares the ``Pattern for clone" entry box for you to type in a clone name. This does not affect any pattern stored as a ``surround" for comparison (see below).

The arrow keys also work to let you move the active box (outlined in red) around the grid. Hitting the return key corresponds to clicking on the current active box.

Menu

Quit, Help, Print
as normal.
Preserve
as in the phsyical map display. If you select POLY1 again after choosing this option then you get a new window, rather than reusing this one.
Center<->Surround
this places the current pattern in a surround to each highlighted box. You can then load a new pattern and compare it to the one stored in the surround. The clear function only affects the central pattern, not the surround, so to clear the surround select Center<->Surround then Clear.
Names
shows YAC names at all locations, rather than square boxes. To see the ones at the right you will have to stretch the box very wide (sorry no scrolling yet). If you print when stretched out you get an index sheet for the filter like the one sent out with it.
Display probe as tree
shows the current probe in a separate window in tree (plain text) form.
Save data with probe
only valid if you have write access and are entering hybridisation data for probes. This lets you save the current pattern with the probe whose name is in the Probe text entry box. Data that is set as weak (light blue) will be saved with "Weak" following the YAC name. When reading in data, if any text is found following the clone name the box will be displayed in light blue, so you could edit this text in a tree update window to be more informative if you wanted.
Set cluster range
This value is used to cluster individual hybridizing YAC's into groups that are supposed to correspond to individual loci. The current algorithm essentially tries to find loci such that the centres of all the positive YAC's are within the defined range. This is not strictly correct. The units are physical map bands. Larger values allow more spread out clusters.
Stats on KeySet
Applies to current keyset, which should contain probes with hybridisation data. It returns stats on how many clusters of various sorts there are, as a line on the terminal.
Change Gridded Clone
doesn't do anything yet.

Super-user status

There is an extra set of facilities available if you set the environment variable ACEDB_SU. In order to make use of them you must also have write access to the database, via wspec/passwd.wrm.

Additions to the Main Window menu

Quit
Before quitting the program when you have changed data but not done a global save this will prompt you to ask if you want to save your changes.
Help
activates the first page of the on line Help.
Write access
Select this option when you need to update the database. If you are a registered user, authorization will be granted silently and the ``Write access'' option of the menu will change into ``Save''. Otherwise, you fail and get an explicative message.
Save
If you have write access, you can save your recent changes and make them permanent. If you ``Quit'' before saving, you are warned and given an extra chance. If you deny it, the recent updates are lost and you revert to the statu quo ante. The ``Session Number'' is increased each time you actually ``Save'' some new data, just to gain write access and save immediatly does not change the session number. A journal of these updates is kept in the file logbook.wrm. Note that everybody who has write access to the database must also have write access to this file.
Read ace files
is used to parse new data into the system. The format of these data files is explained in section 3.4. These are the basic material that gets turned into update files, but they are more flexible in the form of ace files (e.g. you can read them in to check them and then abort without permanently changing the database).
Read models
is needed when you wish to augment the number of classes or modify the description of the classes. You need to own the database to be allowed to select this option and you should be very careful, see chapter Configuration.
Dump
dumps the whole data base in a format compatible with ``Read ace file'' and easy to adapt to any alternative database system. It creates a large ascii file in the main directory. Therefore, you need write access in that directory to perform this dump, i.e. you must essentially own the database. Do this after major updates and tape the dump file.
Test subroutine
corresponds to the file acedbtest.c. It is a convenient way for us developers to link in test subroutines. Probably absent from your version, you may want to reincorporate it if you are writing code.
Chronometer
is a tiny integrated profiler. It is useful only if you wish to optimize the program. Ditto above.

Update mode of the tree display

This is reached by choosing Update from the Tree menu. It is only relevant if you have Write access, and want to add to or change the data in the database.

In update mode, the tree itself is extended with entries from the corresponding model. These are indicated by a light blue background, as also are current entries in the object that are on a ``unique'' branch, i.e. which are customarily replaced if you spepcify a new value (see section on models.wrm in the Installation chapter). You can add to the tree by double clicking on any light blue node. If this is a tag, then the tree is simply extended to include the tag. If it is a data type node (Int, Float, Text etc.) then a light green text entry box is put up, into which you must type the value. If it is a pointer node (indicated by ?{class}) then an entry box is put up that you can type the name of an object into. However you can also specfiy the object by double clicking on it in any other displayed window in the database (the operation that normally displays an object). A non-blocking message pops up to remind you of this option.

You can cancel any text entry operation with the Cancel option from the menu. If you want to cancel operations that have been accepted you have to Quit from the window, choosing the option not to save your changes when prompted.

Update Menu

Quit
quits the window, prompting you first to ask if you want to keep your changes.
Save
restores the original Tree menu and appearance, making any changes you have made available as part of the database during this run time session. The data are not in fact finally saved for future recall until you select the Save option of the Main Window menu.
Add Comment
Adds a comments after the currently highlighted node. A node is highlight by picking once with the mouse. You can only add comments to nodes of the object tree, not of the model. A light green entry box is put up into which you can type your comment, which is finished by hitting Return. You can cancel the operation at any time prior to that with the Cancel option in the menu.
Delete
deletes the currently highlighted node and anything to the right of it. You must have previously highlighted a node of the original tree (part in black or on a unique branch).
Cancel
Cancels the current Add Comment or Add operation.