Newsgroups: bionet.software.acedb
Subject: ACEDB Genome Database Software FAQ
Followup-To: bionet.software.acedb
Reply-To: matthews@greengenes.cit.cornell.edu
Distribution: world
Organization: USDA-ARS, Dept. Plant Breeding, Cornell University
Summary: Frequently Asked Questions about the genome database software ACEDB.

URL: htp://ars-genome.cornell.edu/acedocs/acedbfaq.html
Archive-name: acedb-faq
Last-modified: 27 May 2001
Version: 1.45

ACEDB FAQ


Curated by: Dave Matthews

Frequently Asked Questions about ACEDB

Questions marked with '!' have substantially changed answers since the last update of the FAQ.

Q0: What is ACEDB?

A0:

ACEDB is an acronym for "A Caenorhabditis elegans Database". It can refer to a database and data concerning the nematode C. elegans, or to the database software alone. This document is concerned primarily with the latter meaning. ACEDB is being adapted by many groups to organize molecular biology data about the genomes of diverse species.

ACEDB allows for automatic cross-referencing of items during loading and allows for hypertextual navigation of the links using a graphical user interface and mouse. Certain special purpose graphical displays have been integrated into the software. These reflect the needs of molecular biologists in constructing genetic and physical maps of genomes.

ACEDB was written and developed by Richard Durbin (MRC LMB Cambridge, England) and Jean Thierry-Mieg (CNRS, Montpellier, France), beginning in 1989. It is written in the C programming language and uses the X11 windowing system to provide a platform independent graphical user interface. The source code is publicly available. Durbin & Thierry-Mieg continue to develop the system, with contributions from other groups.

A description by Durbin & Thierry-Mieg:
ACEDB does not use an underlying relational database schema, but a system we wrote ourselves in which data are stored in objects that belong in classes. This is nevertheless a general database management system using caches, session control, and a powerful query language. Typical objects are clones, genes, alleles, papers, sequences, etc. Each object is stored as a tree, following a hierarchical structure for the class (called the "model"). Maps are derived from data stored in tree objects, but precomputed and stored as tables for efficiency. The system of models allows flexibility and efficiency of storage --missing data are not stored. A major advantage is that the models can be extended and refined without invalidating an existing database. Comments can be added to any node of an object.


Q1: What is the current version of the ACEDB software?

A1:

New! The current version for Unix and Windows is 4_9a, 26 Apr 01.
Updates are released ca. monthly at http://www.acedb.org/Software/Downloads/supported.shtml.
These updates don't usually have different version numbers so please note the dates.

The current version for Macintosh is 4.1b1, August 1995. WWW interfaces: (See "Can ACEDB be networked?".)


Q2: Where can I get ACEDB?

A2:

Source code and Unix and Windows binaries are available at: MacAce, from Frank Eeckman, Cyrus Harmon and Richard Durbin:
(Note: The authors are not currently able to support MacAce. Latest version was 4.1b1.)

Q3: What hardware/software do I need to run ACEDB?

A3:

The software is available in binary (pre-compiled) format for a variety of machines. The software is also available as source code, so you may be able to get it working on any machine.

Memory requirements (from Richard Durbin, aug 97)

The amount of memory you require for ACEDB depends very much on how big the database is (i.e. the disk space used by the database/ subdirectory). Our rule of thumb is that one typically uses 5-10Mb plus up to 10% of the disk space size of the database. So with a 200Mb database perhaps 25Mb memory, and with a 500Mb database (e.g. the C. elegans one) up to 50-60Mb. In fact for short sessions less memory is used -- it is only when all classes are explored, or for example when parsing big files that these amounts of memory get used.


Q4: Can ACEDB be networked?

A4:

ACEDB Client / Server Computing (from Doug Bigwood, aug97)

There are several client/server models for ACEDB computing and several more are in development. The start of the ACEDB client/server age began with the inclusion of aceclient and aceserver in version 4.0. These are C - based and use the RPC protocol for communication. These executables can be made from the standard ACEDB distributions.

Starting in version 4.5 an xaceclient is also included with ACEDB. Xaceclient provides remote read/write access to an aceserver while providing the user with the same X displays that are found in xace. To use it, you create an empty database with the appropriate models and start xaceclient. It will automatically retrieve data from the server declared in wspec/server.wrm (the Montpellier server in the distribution server.wrm). The data will be saved locally and can then be viewed with a normal xace.

A perl extension which provides aceclient functionality to Perl 5.x was developed at ACE95. The files necessary for this perl extension are now (ACEDB 4.5 and later) included in the wrpc directory of the ACEDB directory hierarchy. Documentation about how to extend perl is found at http://ars-genome.cornell.edu/acedocs/ace97/perlace/perlacecl.html.

WWWAce and its successor webace were developed to provide a World Wide Web interface for ACEDB. Webace instructions can be found at http://ars-genome.cornell.edu/acedocs/webace.html, and http://ars-genome.cornell.edu/acedocs/ace97/webace.html and the program itself at ftp://ars-genome.cornell.edu/pub/tools/webace.tar.gz.

A Java-based client called Jade allows communication via sockets to an aceserver. Jade installation instructions and information on downloading can be found at http://stein.cshl.org/jade/.

There are now development efforts underway to provide additional client/server functionality to ACEDB including a CORBA server and socket-based communications. These will likely be included in future versions of ACEDB. A new C library interface to ACEDB internals will greatly ease the development of new clients and servers that will support additional protocols.

Subsequent developments (from Dave Matthews, jul00)

A new version of webace, sometimes called webace2, was developed at the Sanger Centre. It makes use of the new gifaceserver instead of aceserver to improve interactive response of the graphical displays, Javascript, Java, and a new Aceclient.pm module which can be installed into Perl without recompiling. It also supports the ACEDB ?URL class. The home page is at http://webace.sanger.ac.uk/. Its authors currently consider it "deprecated", preferring AceBrowser.

AcePerl, from Lincoln Stein, is an object-oriented Perl interface to ACEDB. It can connect to remote ACEDB databases, perform queries, fetch ACE objects, and update databases. The programmer's API is compatible with the Jade Java API. Home page at http://stein.cshl.org/AcePerl/.

AceBrowser, from Lincoln Stein, is a ready-to-use WWW gateway to ACEDB databases built on AcePerl. It has most of the functionality of webace. http://stein.cshl.org/AcePerl/AceBrowser/.

webace2K is an enhancement of webace2, from Maria Nemchuk. http://ars-genome.cornell.edu/webace/webace_install.html

CITA is a CORBA Interface To ACEDB, from UK CropNet. http://jic-bioinfo.bbsrc.ac.uk/BrassicaDB/CITA/


Q5: What documentation exists for ACEDB?

A5:

At the Sanger Centre, www.acedb.org

In the ACEDB Documentation Library, http://ars-genome.cornell.edu/acedocs/

Other


Q6: Can I subscribe to the ACEDB newsgroup by mail?

A6:

Yes! Just send the message "subscribe acedb" to biosci-server@net.bio.net.

You can also post to the newsgroup by mail, write to acedb@net.bio.net.

Or you can access it with a standard newsreader like rn or tin at bionet.software.acedb, or with a WWW browser at news:bionet.software.acedb.

The articles are archived by BIOSCI at http://www.bio.net/archives.html and by Mike Cherry at http://genome-www.stanford.edu/cgi-bin/biosci_acedb. Both archives are indexed for searching. This is the place to find the Questions that really are Frequently Asked!


Q7 : Is there a repository of software tools for ACEDB curators?

A7:

Not really, but there are several partial ones. The main tools available are for converting data from other formats to .ace format.

The USDA-ARS Center for Bioinformatics and Comparative Genomics has some useful tools at http://ars-genome.cornell.edu/acedocs/conversion.html. Some additional ones were contributed at the ACE97 Workshop and can be found in the Proceedings, http://ars-genome.cornell.edu/acedocs/ace97/tools/.

Mike Cherry maintains an archive of tools at ftp://genome-ftp.stanford.edu/pub/acedb_dev/utilities/

For a general tool for converting data to ACEDB format input files, Joachim Baumann (joachim.baumann@informatik.uni-stuttgart.de) has written the Perl program TextConvert, available at ftp.informatic.uni.stuttgart.de/pub/DART/.


Q8: When and where is the next ACEDB Workshop?

A8:

New! The next ACEDB Workshop will be 19-22 June 2001 at Cambridge UK, organized by Sylvia Martinelli. For information see http://www.hgmp.mrc.ac.uk/About/Courses/2001/comp.acedb.course.html.

The ACEDB2000 Workshop was held June 10-16 at Simon Fraser University, B.C., Canada. The Proceedings are at http://www.acedb.org/winfo/Conferences/acedb2000/.

The ACE97 Conference and Workshop was held July 27 - August 9 at Cornell University, Ithaca, New York, USA. See the ACE97 Proceedings Page, http://ars-genome.cornell.edu/acedocs/ace97/proceedings.html for the results.

The Proceedings from the May 1995 ACEDB Conference are available at http://ars-genome.cornell.edu/acedocs/ace95/. A final summary report is available at http://ars-genome.cornell.edu/acedocs/ace95/ace95.final.html. Also available online are collections of snapshots taken during the conference by Frank Eeckman and by Dave Matthews.

For pictures of the ACEDB '94 Workshop in St. Matthieu de Treviers, see the online collections:


Q9: How does ACEDB compare to commercial relational DBMS's?

A9:

From Jean Thierry-Mieg, 4/97:

Obviously, i have a biased opinion, but i would say that acedb is to be recommended if the following criteria are met:

1) A very complex schema, that cannot be developed at once, but will need continuous refinement in parallel with the accumulation of the data

2) The type of questions that will be asked are rather complex, with rather fuzzy answers, that one tries to refine progressively. The acedb browsing capacities are useful in this case and have no equivalent in a relational dbms

______________

I would rather recommend sybase in the following case

1) Simple schema, that can be designed from the start and does not contain too many n.n relations and does not need recursivity

2) The type of questions that will be asked is: succession of de-correlated simple questions with simple answers

____________________

Within this context, i would then list the following goodies of acedb:

1) The ace file format, which is a powerful system to prepare and exchange data between data curators.

2) The existence of an easy graphic browsing interface

3) The availability of a biology-layer, if the application is about genetics

4) Portability (any unix machine), mac (with some limitations), windows (in development) and price (ace is a freeware). This implies that you can actually redistribute the complete system, say on a CD, something impossible with sybase.

5) Ease of use, i seriously believe that ace is much easier to configure and use than sybase.

_____________________

Finally one should consider the following question: concurrency.

Sybase has a well designed transaction system, which will allow roll backs and refined lockings. This is essential for an application like a booking agency, with many users in simultaneous write access.

Ace is much simpler minded. The graphic acedb creates a global lock allowing a single user with write access at the time, and the modifications are not echoed to the other "read access" users in real time.

The non graphic client server system allows parallel downloading of data by many users, it is intended for example for collection of robots sending their independent data in parallel. This is now well tested.

A graphic client system is being developed and now runs in our hands, but is not yet released.

--

Therefore, if you do need real time simultaneous write access with partial locks, and roll backs, use sybase/oracle

________________

Last issue is speed and quantities of data. In principle, sybase/oracle is unlimited, whereas acedb needs to keep around 5-10% of the data in ram. But this apparent difference is misleading.

On a 32 Meg machine, you can run ace with around 300.000 objects with a complex schema at high speed. With say 1M objects, you will need more memory or the performance would totally degrade because of swapping. However, this is really a lot of data.

On a similar machine, your sybase oracle will work with that amount or more data only if you do not perform too many joins. This implies that you are asking simple questions from a simple schema which was indeed our first criterion to choose sybase. If you start asking complex questions and make joins, acedb is actually much more powerful.

During tests run on a big dec alpha server by Otto Ritter in decembre 1995 on several million biological objects with a complex schema, acedb was about 10 times faster than sybase, both to load the data and to answer queries.

I would therefore conclude that the quantity of data is not a criterion pushing one way or the other, it is the complexity of the schema that matters.


Q10: How should ACEDB be cited?

A10:

From the distribution:
We realize that we have not yet published any "real" paper on ACEDB. We consider however that anonymous ftp servers are a form of publication. We would appreciate if users of ACEDB could quote:
Richard Durbin and Jean Thierry Mieg (1991-). A C. elegans Database. Documentation, code and data available from anonymous FTP servers at lirmm.lirmm.fr, cele.mrc-lmb.cam.ac.uk and ncbi.nlm.nih.gov.

Papers involved in database development could quote more precisely:
I. Users' Guide. Included as part of the ACEDB distribution kit,
II. Installation Guide. Included as part of the ACEDB distribution
III. Configuration Guide. Included as part of the ACEDB distribution
and the preprintkit available via anonymous ftp. Jean Thierry-Mieg and Richard Durbin (1992). Syntactic Definitions for the ACEDB Data Base Manager. Included as part of the ACEDB distribution.

--Jean and Richard.


Q11: What ACEDB databases exist?

A11:

Too many to maintain an up-to-date listing. A list as of mid-1998 is available at http://ars-genome.cornell.edu/acedocs/acedbfaq.dbs.html

A repository of many of these databases is maintained by CBCG, both for anonymous ftp at ftp://ars-genome.cornell.edu/pub and for WWW access via Webace at http://ars-genome.cornell.edu/.


Q12: Who prepared this document & where is the current version?

A12:

This document is posted monthly to the BIOSCI newsgroup bionet.software.acedb.

The WWW version is at http://ars-genome.cornell.edu/acedocs/acedbfaq.html.

This FAQ was created and maintained from 1993 - 1996 by Bradley K. Sherman. Major contributions in getting it off the ground were made by Mike Cherry, John McCarthy, and Doug Bigwood. Other contributors include:

It is currently maintained by Dave Matthews.

Please cite as:
Matthews, D.E., and B.K. Sherman, ACEDB Genome Database Software FAQ, http://ars-genome.cornell.edu/acedocs/acedbfaq.html, 1993-2000, approx. 30K bytes.

To add or modify information in this document, please send mail to: matthews@greengenes.cit.cornell.edu

The GrainGenes Project is funded by the USDA ARS Plant Genome Research Program.