Tutorial for the ace97 workshop
Cornell, july 1997
However there are many cases when one wants a direct answer to a particular information. In such a case, the casual user may type a name in the top box of the main window, search for a word in all classes using the lower box of the main window or use one of the user friendly graphic query interfaces: Query builder and Query By Example.
The more advanced users will be able to type directly their queries in the graphic Query interface or using the new query button from the keyset window, and to extract tables using the table-maker interface.
There are however many more possibilities in acedb which are little known and ill documented. The first purpose of this document is to cover the advanced features of the query language: In particular the arithmetic and set logic operators.
Then in the sequel, I will try to explain how one can mix in a very general way ace queries and tables, edit commands and keyset stacks, embeded scripts and embedding scripts, servers and clients. or the table maker. The general idea is that, following the standard UNIX methodology, there is a unique code layer in acedb which actually executes any given elementary query, but the system can pile up layers over layers and pass parameters down the line, provided one includes in the definitions the correct random number of quotes and backslashes.
?A b ?B c ?C myTag Int Text ?B Ct ?Ct #CTT #CTT Length INt ?C c1 Int c2 Int ?Sequence DNA ?DNA Int
A myA myTag 1 x1 mytag 2 x2 b b1 b b2 C cc1 A hisA b b3 c cc2 c cc3 B myB Ct Ct1 Length 17 B hisB Ct Ct1 Length 18 C myC c1 7 c2 9 c2 11 C myC c1 7 c2 9 c2 11 C hisC c1 4 c2 13 c2 22 >mydna atgcgtgtgac
File 'a b' # space included in the name of the file
// Notice that acedb can read fasta format files
>S1
atgcatgc
>S2
tgtgtcgat
tgg
////////////
File update.script
$echo `date` >> update.log
$echo -n "Parsing file %1 ... " >> update.log
Parse %1
$echo "Done" >> update.log
File cl.serv.csh, set up for saint louis lab
#! /bin/csh -f
setenv ACEDB `pwd`
setenv hh peg
setenv pp 2345678 # some crazy port number
aceserver $ACEDB -port $pp &
sleep 1 # needed to let the server detach itself from this shell
aceclient $hh -port $pp << END >! toto
Query Find sequence
show -a DNA
shut // close server now
quit
END
nawk 'BEGIN {n=0;nn=0}/^DNA/{n++;nn += $3;}END {printf("%d seq, %d bases\n",n,nn)}' toto
#\rm toto
A query in acedb is a keyset pipe. It receives a set of keys, eventually empty, and either filters this set for those objects matching a complex condition, or constructs a new set and then filters it.
The simplest constructor is the FIND
The complex query consists of a pipe line of semi-column ';' separated
list of simple queries, which are applied one after the other.
The simple query is made of a serie of elementary queries described elsewhere
glued together by logical oeprators AND, OR, XOR, NOT.
An elementary query is a boolean filter. It receives a develloped
acedb object, with a pointer at the current location, and tries to
locate the current pointer to the new location described by the
elementary query. If it fails, it returns FALSE and the position of the
pointer is not guaranteed. If it succeeds, the pointer is positioned
on the successful data item.
The original location of the pointer only matters if the query is
geographic and contains the reserved word NEXT or a # operator. Such queries are supported
and indeed needed by the kernel, but should be avoided because they are
hard to read and non resistant to schema modifications.
The recommended style is to use a tag name, or the relativelly new
colon operator to move right of a given tag. Following the usual
convention of the C language, Tag:0 is the tag itself, tag:1 the value
immediatly to its right, tags:2 the next one and so on. However, this
syntax is not always applicable. Consider the object:
Inside the bracket, you can have an arbitrary query, including
in principle nested {} although I am not sure if this works.
The square brackets establish arithmetic context. Inside []
the system will allways try to recover numbers and will be allowed
to combined them via the four elementary arithmetic operators
+, -, *, /.
The following operators eval to numbers: [], COUNT, AVG, SUM, MIN, MAX
To be completed
works polish way, you have a stack of keysets
There are two special character commands that are interpreted
in a prticular way if they occur at the beginning of a line:
$shell_command line send the uninterpreted line directly to the Unix shell.
Since this may constitute a breach of security, it is only possible if before
running acedb, you setenv ACEDB_SUBSHELL, and $ is forbiden in aceserver.
Inclusion and shell calls can be recursive and as deep as one may wish, as
in the example below.
The parameter substitution is a macro substitution done at run time
when the %n token is reached. The substituted string is the result of
unprotecting parm. That is, double quotes are removed, backslashes are
interpreted and so on. We are missing an equivalent of the '' csh
syntax which would provide for subtitution as is without
interpretation.
Note that a small perl library written by Steve Rozen
ace/wtools/ace_perl.pm gives you access to acedb queries from perl and
the larger system by Barnett,Bigwood et al provides more perl
functionalities.
Here is an example of a correct input file
// Modified July 1997
Elementary queries
?A myTag Int Text
myA myTag 1 "x1"
2 "x2"
The following queries will succeed:
Query Find A myTag // since myTag is present in the object
Query Find A myTag:0 // for the same reason
Query Find A myTag:1 // since there is a number '1' rigth of mytag
Query Find A myTag:2 // since there is a number rigth of '1'
Query Find A myTag = 1 // because a comparator moves one right
Query Find A myTag:1 = 1 // unless the : was specified
Query Find A myTag:2 = x1 // because : moves along first line
Query Find A myTag = 2 AND NEXT = x2 // a correct use of NEXT
Those will fail
Query Find A myTag:0 = 1 // because :0 is the tag itself
Query Find A myTag:2 = x2 // because :2 wil be positioned on 'x1'
The problem in the last query is that the query evaluation
routine never backtracks.
Finally, recall that in the case of a constructed type
you must use the # operator to move in the subobject
Models:
?B Ct ?Ct #CTT
#CTT Length Int
Object:
myB Ct ct1 Length 17 // # does NOT appear in the ace file
Succesful query:
Query Find B Ct:1 # Length = 17
Set operations
The set operations can be used to mix different sets.
The operators are
SETOR, SETXOR, SETAND, SETMINUS abbreviated as $|, $^, $&, $-
They are applied to pair of sets, recognised by the
curly braket delimiters {}. These bracket establish the set context.
The queries inside the bracket run on the same input set.
However, it is often easier to use the stack operations
described below.
Example
?A b ?B
c ?C
Query Find A ; {FOLLOW b} SETOR {FOLLOW c}
Arithmetic operations
Finally, in the arithmetic context, an attempt is made to cast the
item to the right of the locator to a number. This should succeed for
Int, Float, Date, may succeed for Text, but is not attempted for any
class name, including ?Text. Note that arithmetic operations on dates
use their internal unix representation and are rather ill defined,
except min, max, avg. If casting fails, the query returns FALSE.
Examples:
?C c1 Int
c2 Int
myC c1 7
c2 9
11
The following queries should return myC
Query Find C [ 1 + 2] = 3 // all C objects !
Query Find C [[c1] + [c2]] = 16 // the first value of c2 is used as usual
Query Find C [[c1] + AVG c2 ] = 17
Query Find C [[c1] + MAX c2 ] = 18
Query Find C [[c1] + 100*(COUNT c2)] = 207
Query Find C c2 && [[c2] - [c1]] = [MAX c2 - MIN c2] // returns also hisC (to be tested)
Query Find A COUNT {FOLLOW b} = 2
Stack and edit commands
Example
Find sequence
spush
Follow DNA
sor
spop
list // gives dna and sequences
write filename // exports
Edit available in tace, aceclient and from the edit
button of the keyset windows applies ONE ace line to the
active set
Example
FIND C
edit c1 13 // adds c1 13 to all C objects
edit -D c2 // removes all c2 tags
edit c2 9
Embedded scripts
@includes the file in the flow and substitutes parm1 parm2 ...
in place of %1 %2 %3. It is an error to provide an unsufficient number of
parameters, the result of the error I think is to close the included
file.
Example
acedb> $cat update.script // from John Morris
// Update script for tace
// Modified Thu Aug 4 08:44:09 EDT 1994 JWM
$echo `date` >> update.log
$echo -n "Parsing file %1 ... " >> update.log
Parse %1
$echo "Done" >> update.log
acedb> @update.script "\"a b\""
will actually read the file with the stupid name a b
with included blank and report about it in the update log file
More example are shown in the document on tace by John Morris
available on the acedb.doc server.
Embedding client/server scripts
Converselly, one may embed tace scripts inside a larger csh.
However, this may be costly, because of the over haed of starting
tace. Rather, we suggest to use aceclient. This can be done even
if you do not have a daemoniac server. Just invoke the server in
foreground. In this example, i wish to find the cumul of all sequence
lenghts using a mixture of ace and nawk.
#!/bin/tcsh -f
setenv hh `hostname`
setenv pp 2345678 # some crazy port number
aceserver $ACEDB -port $pp &
sleep 1 # needed to let the server detach itself from this shell
aceclient $hh -port $pp << END >! ~/__toto
Query Find sequence
show -a DNA
shut // close server now
quit
END
nawk 'BEGIN {n=0;nn=0}/^DNA/{n++;nn += $3;}END {printf("%d seq, %d bases",n,nn}' ~/__toto
\rm ~/__toto
Mailing lists
Consider also the aceclient report mode, for mailings. It would be
very useful to extend this to exports of html documents. I suggest
that as an exercice for the meeting.
Subject:
Start from the code wrpc/aceclient.c which is only 300 lines long
and extend its functionality as follows.
In the present system
aceclient host -port port -f input_file
will substitute in input_file any occurence of #(...) by the result
of asking ... to the server. This should be modifed a litle bit
so that the input couild be a meta-html page and the output
a correctly formated html document. In the present system
there would be undesirable space lines and so on.
// @(#)client.report.example 1.2 1/23/96
// Hoping that my computer is running try the following
// that i develloped with Danielle
// ftp from ncbi my executable
// sparc.aceclient.Z
//
// uncompres it and type on your commnad line
// usage:
// "usage : aceclient host [-time_out nn_in_seconds] [-ace_out] [-f reportfile parameters]\n",
// if filename is omitted, you run interactivelly
//
// for example try:
// aceclient 193.49.111.71 -f thisfilename genename
//
// 193.49.111.71 is my dec alpha in montpellier, broken by a thunderstorm these days, sorry.
//
// you will get a developped form of the gene
// embedded in the non // lines
//
// line starting with // are jumped in the output
// %1 and so on announces command line parameter 1 and so on.
//
// #(command text) invokes the ace_server
//
#(Find Gene %1)
Consider gene %1 // %1 represente le nom du gene
These are the known papers:#(Show Reference)
$ls -ls
Or in a more standard blibliographic format:
#(Biblio)
These are the alleles #(Show Allele)
that's all, folks
All I know on tata box:
#(Grep tata) #(List)
# Bonsoir
// fin