Loading data into ACEDB

Dave Matthews and Suzanna Lewis, May 1995
Last update March 1996


What does "loading" do?

To "Read .ace files" into an ACEDB database is to read in the plain text version of your data into the structured binary file database/blocks.wrm, which is the form of the data that is directly accessed by the ACEDB software.

"Add Update File" does the same thing except that a particular kind of .ace file called an Update file is read, and "Read models" is executed first.


"Read .ace files"

Before you can use the command "Read .ace files" in the ACEDB main menu, you must first explicitly request "Write Access" from the same menu.

Figure 1. Obtaining write access.

Once this initial step has been performed,

You can load as many files as you want, one at a time. Until you choose "Save" from the main menu, no permanent changes are written to blocks.wrm, and any user who starts up the database will not see any of your changes. There is no "Revert" command, but you can quit without saving.


Getting write access

A too-common problem is that the main menu is mysteriously blank where the command "Write Access" is supposed to be. This command only appears if the username you are logged in under is listed in wspec/passwd.wrm. This should be all that's necessary. If you still don't see "Write Access", check this bug work-around. (In older versions of ACEDB it was also necessary to set the environment variable ACEDB_SU, but this is not required in version 3-0.)

A second, smaller hurdle is that ACEDB can't modify blocks.wrm unless the Unix file permissions for the directory $ACEDB/database and its contents include write-permission for the username you are logged in under.

Yet another Unix-related gotcha: if your default umask does not allow any one else to write and you're the first one to create the database/lock.wrm file, then even if everything else is setup other people still can't get write access. [DEM 3/96: This no longer seems to be true. Ace4_1 removes lock.wrm when the database is Saved or Quit. There is still a problem if ace crashes while lock.wrm exists; in this case it must be removed manually.]

New! See the document ftp://ncbi.nlm.nih.gov/repository/acedb/doc/, which see for possible updates.


Loading .ace files via tace

You can also load .ace files using tace, the command-line analogue of the X-windows ACEDB. The tace command corresponding to "Read .ace files" is "parse". This allows loading files via scripts instead of the usual interactive one-file-at-a-time interface. For example the following shell script loads all the .ace files in the rawdata directory:
#!/bin/csh
foreach acefile (`ls $ACEDB/rawdata/*.ace`)
  $ACEDB/bin/tace <<quit
  parse $acefile
quit
end


Loading .ace files via tclace

Yet another alternative is loading .ace files using tclace, the Tcl/Tk analogue of the X-windows ACEDB. The tclace command corresponding to "Read .ace files" is "parse". This allows loading files via tcl scripts instead of the usual interactive one-file-at-a-time interface. For example the following shell script loads all the .ace files in the all acedata sub-directories:
set ACEDB $env(ACEDB)

#
# ----------Subprocedures----------
#

proc send_message {messg} {
    puts stderr $messg
}

set loadDirectories [glob -nocomplain $ACEDB/acedata/load*]
foreach loadDir $loadDirectories {
    send_message "Loading $loadDir\n"
    set aceFiles [glob -nocomplain $loadDir/*.ace]
    set sortedFiles [lsort $aceFiles]
    foreach filename $aceFiles {
        acedb Parse $filename
    }
}

Something to be aware of when doing this is that each Parse command, in both tace and tclace, creates a new session. It automatically grabs write access and then, after Parsing, saves the new data. If you compare the sizes of databases made from identical data sets; one of which has been created using a single save from the graphical interface and the other of which has been created using such scripts, they can be considerably different. The latter will be larger because of the overhead of saving each session.


Updating previously loaded data with acediff

One strategy for maintaining and updating a database is to keep each "batch" of data (on a particular subject, from a particular data source, etc.), in its own separate .ace file in $ACEDB/rawdata. When a new version of the data becomes available, it is necessary not just to load the new .ace file, with its new values for the various fields, but also to remove the old values. For example if old.ace contains
Colleague : "Matthews, Dave"
Phone "123-4567"
and new.ace says
Colleague : "Matthews, Dave"
Phone "987-6543"
, simply loading new.ace into the existing database would produce
Colleague : "Matthews, Dave"
Phone "123-4567"
Phone "987-6543"
Instead, use the program acediff, which you should find included in your $ACEDB/bin directory. Saying "acediff old.ace new.ace > DIFF.ace" produces a file DIFF.ace containing:
Colleague "Matthews, Dave"
-D Phone "123-4567"
Phone "987-6543"
Now load DIFF.ace into the database, and replace rawdata/old.ace with new.ace.


Validating .ace files

A few ideas for ways of validating .ace data:

1. Build a set of saved queries for particular invalid conditions, e.g. "find germplasm COUNT species > 1". Load the new .ace file and run these validation queries before saying "Save".

2. To examine all the values of a tag present in a particular .ace file:

% grep <tagname> <acefile> | sort | uniq 
For example:
% grep Species germplasm.ace | sort | uniq
Species "Aegilops elongatum"
Species "Triticum aestivum"
Species "Triticum tauschii"
Species "Triticum umbellulata"
This kind of "overview" examination can also be done conveniently using the TableMaker.


Importing data from Sybase tables, filemaker DBs, excel, GenBank, etc.

The gist of this problem is the transforming these types of data sets into .ace files. Mappings must be created between the schemas and data records of these databases to the classes and the tags defined in the models. While some work has been done writing code to solve this problem there are not yet general solutions.

line2ace is one effort, but many essential features are missing. However, if the data records are simple it is a very useful tool. By simple I mean that one line strictly correlates to one ace object and that each field in the line corresponds to one object tag. This easily becomes complicated in the case of Mac excel data because visually empty cells may result in either a tab character or nothing at all.

Another available tool is TextConvert, by Joachim Baumann. The conversion rules are described in an AWK-like language.

Some of the Sybase translators that have been created are as follows:

Once the .ace files are correctly created there comes a secondary problem, of course, which is validating the data, as discussed (cursorily) elsewhere in this document.


Related documents and software

  • There is set of file conversion utilities in the ACEDB Developers' Archive at MGH, for creating .ace files from Genbank, Mapmaker, etc.

  • TextConvert, by Joachim Baumann, is a more general tool allowing conversion rules to be described in an AWK-like language.

  • Detlef Wolf has compiled a comprehensive list of file conversion tools in his report of the Query, Tablemaker and Tools" working group of the ACE95 conference.

  • A Guide to Models and Ace Files, by Mary O'Callaghan.

  • A C. elegans Database: III. Configuration guide. The classic document on models and .ace files.

  • Search the Bionet ACEDB newsgroup. (E.g. try searching for "access" to find out about write-access problems.)