Searching an ACeDB database

Dave Matthews and Suzanna Lewis, May 1995

Topics

It is essential before beginning to define keyset. Every object that exists in the database can be uniquely identified by a key. Collections or sets of these keys, identifying a group of objects in the database, are called keysets. The purpose of the query language is to retrieve and create sets of objects that conform to the user's desired criteria.

Additional References on Query language syntax


Template search

This is the simplest and fastest search in which you search a particular class for an item that is a member of that class:
  1. In the Main Window select the class you which to search by clicking with the mouse to highlight that class.
  2. Click on the Template field and type the name of the item you which to locate. Hitting the return key initiates the search and will open a text window that contains information about the requested item. It is possible to search using the * symbol as a wild card. For example, searching the Author class with the letters Luc*, results in retrieving all authors whose last name begins with the letters Luc. Double clicking on one of the names will open a window about that particular author.

Figure 1. Template search.


Text search

The text search allows you to search all text in all fields for a match to a text string:.
  1. In the Main Window select the Text Search space.
  2. Type the query string in the Text Search space and initiate the search by clicking the Long Search button. This is a slow process since the entire database needs to be searched and can take several minutes. The result is a window that displays the retrieved items. For example, in a Text search for sos, all fields are searched for the string sos. A variety of items are found, including the gene description of the sos gene, references whose title refer to this gene, the alleles of the sos gene, etc.

Figure 2. Text search (grep).


Sample Queries

The following section gives examples that illustrate some common queries. The window being used for building the queries is the Query Builder window. Querying within tace or scripace is much the same. The primary difference being the addition of the Where operator that is used to apply criteria to a keyset that has previously been found.

Figure 3. Query Builder Window

  1. Find Gene s*
    This query finds all genes whose name begins with s. The keyword Find indicates that the entire database is to be searched. Gene dictates the class of the object to be searched for. The wild card character * means that any number of different characters may follow the first character s.
  2. Find Gene s??
    This query finds all genes whose name begins with s and have exactly three letters. The ? is another wild card character that indicates replacement of a single character, rather than a string of any length.
  3. Find P_element at_stock_center
    This query finds all the P elements that are available at the stock center. at_stock_center is a tag in the model for the class P_element. The existence of this tag is a criteria that restricts the search. Any tag that has been defined for a class can be used in this way.
  4. Find Paper Author = Ashburner*
    This query finds all papers of which Ashburner is an author. It illustrates the use of a conditional operator to restrict allowed values.
  5. Find Paper Author = Ashburner* & Journal = Nature & Year < 1990
    This query finds all papers of which Ashburner is an author and that were published in Nature before 1990. The query shows the use of boolean operators to combine restrictions.
  6. Find Paper Author = Ashburner* & Journal = Nature & Year > 1985 & HERE < 1990
    This query finds all papers and that were published in Nature after 1985 and before 1990 (1986-1989). HERE is a navigational operator that performs another check on the same value.
  7. Find P1 Polytene >= 30C & NEXT <= 30F
    This query finds all the P1 clones that have been mapped to the 30C to 30F chromosomal region. NEXT is a navigational operator that indicates the next value in a vectored tag.
  8. Find Gene Name # Symbol = s*
    This query finds all the genes whose gene symbol begins with s. Name is a constructed subtype in the models and Symbol is a tag within this subtype. The # character allows the search to query tags within subtypes. This search is much slower than the first two searches because you are looking at a subtag within each gene record.
  9. Find Aberration Df* & Covers = 30C* ; Follow Covers
    Together these two queries finds all chromosomal bands that lie in the region uncovered by the set of deficiencies that uncover all or part of 30C. The character ; is a punctuation mark that allows you to chain queries together.
  10. Follow Genes
    If the previous query were followed by this query (either directly using ; or as another query command then this query would find all genes that lie in the region uncovered by the set of deficiencies that uncover all or part of 30C. The Follow command allows you to move from one class to another.
  11. Find Contig COUNT STS > 10
    This example finds all the contigs that have more than 10 STS in them. The COUNT command is used for searches based on the number of values for a particular tag that exist, rather than the values of the individual items.

Query language search

Searching utilizes a query language to conduct more complex searches. Query language searches are potentially powerful, but are often slow. Moreover, carrying out such searches requires knowledge of the query language written for ACeDB by J.Thierry-Mieg, combined with a knowledge of the objects and tags defined in the models.

The Query Commands window is opened using the pull down Query menu. You click in the yellow text box and then type your query.

The purpose of the query language is to define criteria and then locate items that satisfy those criteria. You can search either the entire database or just one selected keyset.