Newsgroups: bionet.software.acedb Subject: ACEDB Genome Database Software FAQ Followup-To: bionet.software.acedb Reply-To: firstname.lastname@example.org Distribution: world Organization: USDA-ARS, Dept. Plant Breeding, Cornell University Summary: Frequently Asked Questions about the genome database software ACEDB. URL: htp://ars-genome.cornell.edu/acedocs/acedbfaq.html Archive-name: acedb-faq Last-modified: 27 May 2001 Version: 1.45
ACEDB allows for automatic cross-referencing of items during loading and allows for hypertextual navigation of the links using a graphical user interface and mouse. Certain special purpose graphical displays have been integrated into the software. These reflect the needs of molecular biologists in constructing genetic and physical maps of genomes.
ACEDB was written and developed by Richard Durbin (MRC LMB Cambridge, England) and Jean Thierry-Mieg (CNRS, Montpellier, France), beginning in 1989. It is written in the C programming language and uses the X11 windowing system to provide a platform independent graphical user interface. The source code is publicly available. Durbin & Thierry-Mieg continue to develop the system, with contributions from other groups.
A description by Durbin & Thierry-Mieg:
ACEDB does not use an underlying relational database schema, but a system we wrote ourselves in which data are stored in objects that belong in classes. This is nevertheless a general database management system using caches, session control, and a powerful query language. Typical objects are clones, genes, alleles, papers, sequences, etc. Each object is stored as a tree, following a hierarchical structure for the class (called the "model"). Maps are derived from data stored in tree objects, but precomputed and stored as tables for efficiency. The system of models allows flexibility and efficiency of storage --missing data are not stored. A major advantage is that the models can be extended and refined without invalidating an existing database. Comments can be added to any node of an object.
Q1: What is the current version of the ACEDB software?
The current version for Macintosh is 4.1b1, August 1995. WWW interfaces: (See "Can ACEDB be networked?".)
Q2: Where can I get ACEDB?
Memory requirements (from Richard Durbin, aug 97)
The amount of memory you require for ACEDB depends very much on how
big the database is (i.e. the disk space used by the database/
subdirectory). Our rule of thumb is that one typically uses 5-10Mb
plus up to 10% of the disk space size of the database. So with a
200Mb database perhaps 25Mb memory, and with a 500Mb database
(e.g. the C. elegans one) up to 50-60Mb. In fact for short sessions
less memory is used -- it is only when all classes are explored, or
for example when parsing big files that these amounts of
memory get used.
Q4: Can ACEDB be networked?
ACEDB Client / Server Computing (from Doug Bigwood, aug97)
There are several client/server models for ACEDB computing and several more are in development. The start of the ACEDB client/server age began with the inclusion of aceclient and aceserver in version 4.0. These are C - based and use the RPC protocol for communication. These executables can be made from the standard ACEDB distributions.
Starting in version 4.5 an xaceclient is also included with ACEDB. Xaceclient provides remote read/write access to an aceserver while providing the user with the same X displays that are found in xace. To use it, you create an empty database with the appropriate models and start xaceclient. It will automatically retrieve data from the server declared in wspec/server.wrm (the Montpellier server in the distribution server.wrm). The data will be saved locally and can then be viewed with a normal xace.
A perl extension which provides aceclient functionality to Perl 5.x was developed at ACE95. The files necessary for this perl extension are now (ACEDB 4.5 and later) included in the wrpc directory of the ACEDB directory hierarchy. Documentation about how to extend perl is found at http://ars-genome.cornell.edu/acedocs/ace97/perlace/perlacecl.html.
WWWAce and its successor webace were developed to provide a World Wide Web interface for ACEDB. Webace instructions can be found at http://ars-genome.cornell.edu/acedocs/webace.html, and http://ars-genome.cornell.edu/acedocs/ace97/webace.html and the program itself at ftp://ars-genome.cornell.edu/pub/tools/webace.tar.gz.
A Java-based client called Jade allows communication via sockets to an aceserver. Jade installation instructions and information on downloading can be found at http://stein.cshl.org/jade/.
There are now development efforts underway to provide additional client/server functionality to ACEDB including a CORBA server and socket-based communications. These will likely be included in future versions of ACEDB. A new C library interface to ACEDB internals will greatly ease the development of new clients and servers that will support additional protocols.
Subsequent developments (from Dave Matthews, jul00)
AcePerl, from Lincoln Stein, is an object-oriented Perl interface to ACEDB. It can connect to remote ACEDB databases, perform queries, fetch ACE objects, and update databases. The programmer's API is compatible with the Jade Java API. Home page at http://stein.cshl.org/AcePerl/.
AceBrowser, from Lincoln Stein, is a ready-to-use WWW gateway to ACEDB databases built on AcePerl. It has most of the functionality of webace. http://stein.cshl.org/AcePerl/AceBrowser/.
webace2K is an enhancement of webace2, from Maria Nemchuk. http://ars-genome.cornell.edu/webace/webace_install.html
CITA is a CORBA Interface To ACEDB, from UK CropNet.
Q5: What documentation exists for ACEDB?
You can also post to the newsgroup by mail, write to email@example.com.
Or you can access it with a standard newsreader like rn or tin at bionet.software.acedb, or with a WWW browser at news:bionet.software.acedb.
The articles are archived by BIOSCI at
and by Mike Cherry at
http://genome-www.stanford.edu/cgi-bin/biosci_acedb. Both archives are indexed
for searching. This is the place to find the Questions that really are Frequently
Q7 : Is there a repository of software tools for ACEDB curators?
The USDA-ARS Center for Bioinformatics and Comparative Genomics has some useful tools at http://ars-genome.cornell.edu/acedocs/conversion.html. Some additional ones were contributed at the ACE97 Workshop and can be found in the Proceedings, http://ars-genome.cornell.edu/acedocs/ace97/tools/.
Mike Cherry maintains an archive of tools at ftp://genome-ftp.stanford.edu/pub/acedb_dev/utilities/
For a general tool for converting data to ACEDB format input files, Joachim Baumann (firstname.lastname@example.org) has written the Perl program TextConvert, available at ftp.informatic.uni.stuttgart.de/pub/DART/.
Q8: When and where is the next ACEDB Workshop?
The ACEDB2000 Workshop was held June 10-16 at Simon Fraser University, B.C., Canada. The Proceedings are at http://www.acedb.org/winfo/Conferences/acedb2000/.
The ACE97 Conference and Workshop was held July 27 - August 9 at Cornell University, Ithaca, New York, USA. See the ACE97 Proceedings Page, http://ars-genome.cornell.edu/acedocs/ace97/proceedings.html for the results.
The Proceedings from the May 1995 ACEDB Conference are available at http://ars-genome.cornell.edu/acedocs/ace95/. A final summary report is available at http://ars-genome.cornell.edu/acedocs/ace95/ace95.final.html. Also available online are collections of snapshots taken during the conference by Frank Eeckman and by Dave Matthews.
For pictures of the ACEDB '94 Workshop in St. Matthieu de Treviers, see the online collections:
Obviously, i have a biased opinion, but i would say that acedb is to be recommended if the following criteria are met:
1) A very complex schema, that cannot be developed at once, but will need continuous refinement in parallel with the accumulation of the data
2) The type of questions that will be asked are rather complex, with rather fuzzy answers, that one tries to refine progressively. The acedb browsing capacities are useful in this case and have no equivalent in a relational dbms
I would rather recommend sybase in the following case
1) Simple schema, that can be designed from the start and does not contain too many n.n relations and does not need recursivity
2) The type of questions that will be asked is: succession of de-correlated simple questions with simple answers
Within this context, i would then list the following goodies of acedb:
1) The ace file format, which is a powerful system to prepare and exchange data between data curators.
2) The existence of an easy graphic browsing interface
3) The availability of a biology-layer, if the application is about genetics
4) Portability (any unix machine), mac (with some limitations), windows (in development) and price (ace is a freeware). This implies that you can actually redistribute the complete system, say on a CD, something impossible with sybase.
5) Ease of use, i seriously believe that ace is much easier to configure and use than sybase.
Finally one should consider the following question: concurrency.
Sybase has a well designed transaction system, which will allow roll backs and refined lockings. This is essential for an application like a booking agency, with many users in simultaneous write access.
Ace is much simpler minded. The graphic acedb creates a global lock allowing a single user with write access at the time, and the modifications are not echoed to the other "read access" users in real time.
The non graphic client server system allows parallel downloading of data by many users, it is intended for example for collection of robots sending their independent data in parallel. This is now well tested.
A graphic client system is being developed and now runs in our hands, but is not yet released.
Therefore, if you do need real time simultaneous write access with partial locks, and roll backs, use sybase/oracle
Last issue is speed and quantities of data. In principle, sybase/oracle is unlimited, whereas acedb needs to keep around 5-10% of the data in ram. But this apparent difference is misleading.
On a 32 Meg machine, you can run ace with around 300.000 objects with a complex schema at high speed. With say 1M objects, you will need more memory or the performance would totally degrade because of swapping. However, this is really a lot of data.
On a similar machine, your sybase oracle will work with that amount or more data only if you do not perform too many joins. This implies that you are asking simple questions from a simple schema which was indeed our first criterion to choose sybase. If you start asking complex questions and make joins, acedb is actually much more powerful.
During tests run on a big dec alpha server by Otto Ritter in decembre 1995 on several million biological objects with a complex schema, acedb was about 10 times faster than sybase, both to load the data and to answer queries.
I would therefore conclude that the quantity of data is not a criterion pushing one way or the other, it is the complexity of the schema that matters.
Q10: How should ACEDB be cited?
Papers involved in database development could quote more
I. Users' Guide. Included as part of the ACEDB distribution kit,
II. Installation Guide. Included as part of the ACEDB distribution
III. Configuration Guide. Included as part of the ACEDB distribution
and the preprintkit available via anonymous ftp. Jean Thierry-Mieg and Richard Durbin (1992). Syntactic Definitions for the ACEDB Data Base Manager. Included as part of the ACEDB distribution.
--Jean and Richard.
Q11: What ACEDB databases exist?
A repository of many of these databases is maintained by CBCG, both for anonymous ftp at ftp://ars-genome.cornell.edu/pub and for WWW access via Webace at http://ars-genome.cornell.edu/.
Q12: Who prepared this document & where is the current version?
This document is posted monthly to the BIOSCI newsgroup bionet.software.acedb.
The WWW version is at http://ars-genome.cornell.edu/acedocs/acedbfaq.html.
This FAQ was created and maintained from 1993 - 1996 by Bradley K. Sherman. Major contributions in getting it off the ground were made by Mike Cherry, John McCarthy, and Doug Bigwood. Other contributors include:
Please cite as:
Matthews, D.E., and B.K. Sherman, ACEDB Genome Database Software FAQ, http://ars-genome.cornell.edu/acedocs/acedbfaq.html, 1993-2000, approx. 30K bytes.
To add or modify information in this document, please send mail to: email@example.com
The GrainGenes Project is funded by the USDA ARS Plant Genome Research Program.