Dave Matthews and Suzanna Lewis, May 1995
It is essential before beginning to define keyset. Every object that exists in the database can be uniquely identified by a key. Collections or sets of these keys, identifying a group of objects in the database, are called keysets. The purpose of the query language is to retrieve and create sets of objects that conform to the user's desired criteria.
Figure 1. Template search.
Figure 2. Text search (grep).
Figure 3. Query Builder Window
The Query Commands window is opened using the pull down Query menu. You click in the yellow text box and then type your query.
The purpose of the query language is to define criteria and then locate items that satisfy those criteria. You can search either the entire database or just one selected keyset.
Whenever you begin a query with the command Find you will search the entire database. The Find command MUST be followed by a specific class name. (A list of classes appears in the main window: P_elements; P1; Yacs; cosmids; contigs; chromosome bands; etc.. Use this list to select the class you wish to search for.). Optionally the specified class may be followed by further items to restrict the query.
The Find class operation may be omitted. You can form a query that is made up only of criteria that must be met. In this case the query is performed on the active keyset and just the items in that keyset will be searched.
The items that satisfy your criteria will be placed in either the Main Keyset window or in new keyset window called Query Results.
Each class in the database has a set of attributes. Attributes are described with 'tags'; for example, one of the tags used to describe P-element insertion lines indicates if the P element line is at the Bloomington Stock Center (at_stock_center). The simplest query is just to ask: Does this tag exist (for any member(s) of this class)? For example, you could search for all the P elements that are available from the stock center by typing in the text box: 'Find P_element at_stock_center'. Alternatively you could open a keyset window containing the P element class by double-clicking on P elements in the Main window and then typing the query 'at_stock_center'.
Instead of simply asking Is this tag present?, you can ask whether or not the tag has a particular value. The conditional (Boolean) operators available include:
The value, stated after the operator, can be either text or numeric. The operators GT and LT will work with text, making determinations based on the alphabetical order.
Criteria may be combined in a single search using either the keywork AND or an & to combine conditions (A AND B). Either OR or | may be used to indicate OR (A OR B).
You may string together several criteria. To define the way in which multiple criteria are applied you may use parentheses: For example, to distinguish (A OR B OR C ) AND D from (A OR B) OR (C AND D).
Because parentheses have this special meaning items such as P element names or aberration names that contain parentheses should be surrounded by quotation marks when they are used in queries. The other boolean operation is XOR which is the exclusive or condition. If you desire either A OR B, but not both (A AND B) then this is the operator to use.
Value fields can accept two types of wild card symbols. An asterisk * designates a string of any length. A question mark ? designates any single character in that position.
Some tags are further divided into subtags. For example, the Name tag in the Gene Class has the subtags 'Symbol' and 'FlyBase_ID'. For a search based on the presence or value of a subtag, both the tag and the subtag must be specified, with the subtag prefaced by a # sign. For instance, if you wanted to find gene with the FlyBase_ID FBgn0000075 you would type: Find Gene Name # FlyBase_ID = FBgn0000075.
Some tags are vectored, that is, a series of values may lay to the right of a single tag. For example, the Polytene tag in the P1 Class is followed by two values indicating the starting and ending in situ chromosomal assignment for that clone (e.g., Polytene 37C2 37C5). The NEXT operator is used to move the search in such vectored tags.
It is also sometimes useful to check the same value more than once. The HERE operator allows you to do this. It indicates that another criteria should also be applied to the same value just tested.
If you use a semi-colon ; as the ending punctuation for a query you can add further queries after the semi- colon. In this way the results of one query become the keyset for the next query.
For searches based on the number of a particular tag that exist, rather than the values of the individual tags, you use a count operator. For example to find all the contigs that have more than 10 STS in them you would type Find Contig STS Count > 10.
The Follow operator allows you to do a particular type of combined search in which you use the attributes of one class to select a subset of another class. In contrast, in a normal combined search you are simply applying additional criteria to further define a subset of one class. The use of the Follow operator is illustrated by a query designed to identify all the P elements that fall in the 30C to 30F chromosomal region. This query uses information stored in both the P_element and Chrom_Band classes. To find all the P_elements that fall in the 30C to 30F region, you will first query the Chrom_Band class to select all the chromosome bands that fall between 30C and 30F. Then you will look in the data stored about each of those chromosome bands to see which ones have P elements. The simple query 'Find Chrom_Band Contained_in >=30C & Contained_in <= 30F' would give you a list of all the chromosome bands in this genomic interval. The query ''Find Chrom_Band Contained_in >=30C & Contained_in <= 30F & P_elements' will give you a list of those bands in which a P element has been identified. However you want a list of the P elements, not a list of the bands. The chromosome band records have tags that point to the P elements and you can follow that tag using the Follow operator using the query 'Find Chrom_Band Contained_in >=30C & Contained_in <= 30F; Follow P_elements'. The result is the desired list of P_elements that map to the 30C - 30F chromosome region.
After you collect a number of keysets using queries you may want to view the commonalties or differences between them. The keyset menu provides you with a set of options for combining keysets. As in a Venn Diagram, you can get an intersection of two keysets (Intersection of keysets), the union (Union of Keysets), all items that are in one or the other but not both (Difference of Keysets), or a combination of what is exclusive to each (Union But Not Intersection of Keysets).