Bulk data entry

ACEDB allows data to be entered either interactively or by a bulk loading process. Interactive entry involves filling out a form on-screen and is not described here.

Bulk loading is accomplished by creating a text file containing data and then reading it in to the database.

Text editors

Data files can be created with any editor capable of saving information as text (ASCII). Many curators recommend that you put in the time to learn Emacs, an extremely capable editor that runs in both environments (a complete port to the Macintosh has just become available). Emacs can be used as a simple, even friendly, editor, but its advantage lies in its power to execute very complex manipulations via "macros". This feature becomes valuable once you begin reformatting large files for input into your database. Emacs is free and is available from prep.ai.mit.edu and many other sites. However, for the exercises in this tutorial any text editor will suffice.

As you gain experience, it will be advantageous to learn a scripting language like Perl. The up-front effort required to learn Perl is more than amply rewarded by the increase in productivity. In comparison with other programming languages, Perl is easy to learn. The Perl interpreter is free and can be downloaded from a variety of sites.

Where to put data files

One of the environment variables you can set for ACEDB is ACEDB_DATA Set this variable to point to the location of your raw data files. By default, this is defined as the "rawdata" subdirectory in $ACEDB. ACEDB_DATA is sensed by the database when it starts and is where ACEDB looks first when you ask to load data (below)

What to call your data files

By all means give your data files descriptive names. But remember to include the ".ace" file extension on the end:

chromosome1.ace                 Image.captions.ace
papers_from_1993.ace            2-point-data.ace
new_people.ace                  MY_TSHIRTS.ace
When ACEDB lists the contents of $ACEDB_DATA (i.e., "rawdata"), by default it only shows files with a ".ace" suffix.

Contents of a data file

Whether data files may be generated by hand, by computer program, or by combination of methods, data must be presented in a certain way. There are two factors: first, there are a few general rules describing legal format; second, the structure of the models determines what data can be entered.

The model for any class specifies the class name plus fields and field labels. This structure is "cloned" to create real independent objects. The organization of the data file reflects these principles.

For example, consider this data which describes two t-shirts:

TShirt sam1
Price 15.95
Damage "bleach marks" "5 March 1992"

TShirt sam2
Borrowed_by "Sarah Edmunds"
Source "Dead Concert"

//end of my t-shirt file (last modified 2 March 1994)
separate objects with one or more blank lines
As ACEDB reads the file, whenever it encounters a blank line followed by an occupied line, it interprets the occupied line as the beginning of an object description. The description continues until the next blank line. Therefore, separate your objects with at least one blank line.
class and object names must be first
The first line for each object begins with its class and object name. This information is absolutely required for all objects. The names are separated by at least one space.
use genuine labels
The labels you use must correspond to real labels in the model. If a label is not in the model, ACEDB will not allow you to read the data associated with it from your file.
separate labels and fields
When a label is followed by data, seperate them with at least one space.
be careful of field type and order
The fields you fill in must correspond to real fields in the model and they must be of the correct type and in the correct order. If the model is

?TShirt Price Float
        Damage Text Text // description and date
You cannot say

Price 15.95 "a bargain"
because there is no place for "a bargain" in the model. You cannot say

Price "fifteen dollars"
because "fifteen dollars" is not a decimal (floating point) number. It would be permitted but incorrect to say

Damage "5 March 1992" "bleach marks"
if you were intending to use the first field for the damage description and the second for the date.
data can contain "whitespace"
Some data naturally has "whitespace" in it, for example "Sarah Edmunds". To keep ACEDB interpreting that as two items, it is surrounded with quote marks. However, you cannot enter "pure" whitespace (spaces, tabs, a combination of these, or "") into Float, Int, DateType or class (?) fields. Whitespace will work for Text fields, but see this note before using them.
separate data items on the same line
When you want to have two distinct items on the same line, separate them with one or more spaces.
multiple fields must be loaded from left to right
In cases where a label is followed by multiple fields, you can't fill in a field unless the one to the left is already occupied. Whereas this is possible:

Damage      Text   Text     //the model
              |      |
              |      |
              |      |
Damage      "Hi!" "Bye!"    //the data
Unfortunately this is not:


Damage      Text  Text     //the model
                   /
                  /
                 /
Damage     "Bye!"          //the data
Null (blank) values will work for Text fields, but see this note before using them.
ACEDB won't see comments
ACEDB ignores "//" and everything to the right of it. This allows you to insert comments into your data files. If a line begins with "//", ACEDB will ignore the entire line. Such lines are notinterpreted as empty lines.
Don't use double quote marks
if your data contains double quote marks (ASCII 34 in decimal) you should replace them with single quotes or a pair of single quotes. In other words,

Remarks "He said "I have two cats!""
should be converted to one of the following
Remarks "He said 'I have two cats!'"
Remarks "He said ''I have two cats!''"
Hopefully the need for this will disappear in future versions of the software.

Reading in data via xace

To read in data, you must first obtain write access by requesting it via the main window menu. If this option does not appear it is because your username is missing from wspec/passwd.wrm. Edit this file and start the database again. Now you should see the "Write Access" option at the bottom of the menu:

Next, you must select "Read .ace files" on the main window menu which will present the ".ace file parser". Clicking the "Open File" button will lead to a second window titled "Which ace file do you wish to parse?". Make your selection by double-clicking a filename (or by clicking the filename once, then the "OK" button). The "Which ace file..." window will disappear. At this point ACEDB is poised to read the file and will attempt to read it in (in entirety) when you click the "Read all" button in the ".ace file parser" window. (The other buttons allow you to step through a data file and are useful for troubleshooting).

As reading occurs, the Item (i.e., object) and Line numbers will reflect ACEDB's progress through the file. Errors of the sort noted earlier will stop the process with an informational message. This one is due to the fact that the ?TShirt model does not have a "Note" label:

Once you click "Continue" you will have the opportunity to continue reading (ACEDB will skip over the error) or abort the read.

When ACEDB reaches the end of the file, the new objects are immediately available for inspection via the selection list. However, they are not permanent until you save (main window menu).

It is perfectly legal to read in the same data file repeatedly. ACEDB will ignore whatever it already "knows" about and detect any changes you have made. However, ACEDB will not delete information simply because you have removed it from the file; for this task you need to become familiar with another .ace file option and the "acediff" tool.

Reading in data via tace

Tace can also be used to load data and is in many respects more convenient than xace. Start tace at the command line, then once the tace prompt appears, use the "parse" or "pparse" commands:

acedb> parse rawdata/test.ace
// 4 objects read with 0 errors// 4 Active Objects 
acedb>
The difference between parse and pparse is: the former halts the load as soon as an error is encountered, while the latter reports the error, skips the affected object, and continues. The errors are written into the database log file for reference later.

Other loading methods

If you are interested loading data automatically, it is possible to do so using tace or the aceserver. For example, one can use a script like this:

#!/bin/csh
setenv ACEDB_NO_BANNER
setenv ACEDB /rusty/dbs/testdb
/rusty/dbs/acedb/bin/tace << END
pparse rawdata/test.ace
list
quit
END
and then load the database by executing the script. Many variations on this theme can be implemented to support complex database management tasks or interface to processes generating data. The aceserver, along with an appropriate client, is even more versatile because it permits a two-way conversation with the database.


Show me the whole t-shirt model again.

Back to Table of Contents