With the "Develop a table..." selection, a message may appear:
The selection "(slow) import this protein ..." is a link to a program
set under ACEDB's /wscript directory which retrieves the highlighted sequence
over the WWW interface.
Homol DNA_homol ?Sequence XREF Pep_homol ?Method Float Int UNIQUE Int Int UNIQUE Int
Pep_homol ?Protein XREF Pep_homol ?Method Float Int UNIQUE Int Int UNIQUE Int
Motif_homol ?Motif XREF Pep_homol ?Method Float Int UNIQUE Int Int UNIQUE Int
Homol DNA_homol ?Sequence XREF Motif_homol ?Method Float Int UNIQUE Int Int UNIQUE Int
Pep_homol ?Protein XREF Motif_homol?Method Float Int UNIQUE Int Int UNIQUE Int
Motif_homol ?Motif XREF Motif_homol ?Method Float Int UNIQUE Int Int UNIQUE Int
Sequence : "B0001"
DNA_homol "CESAA60F" "BLASTN" 174.000000 2072 2022 109 159
DNA_homol "CESAA60F" "BLASTN" 148.000000 20890 20856 109 143
DNA_homol "yk15d5.3" "BLASTN" 157.000000 23719 23780 93 154
DNA_homol "yk15d5.3" "BLASTN" 145.000000 3795 3749 57 103
Pep_homol "SW:KPPR_ARATH" "BLASTX" 79.000000 10026 9922 161 195
Pep_homol "SW:KPPR_ARATH" "BLASTX" 64.000000 9867 9718 213 262
Pep_homol "SW:KPPR_MESCR" "BLASTX" 58.000000 10276 10184 123 153
Pep_homol "SW:KPPR_MESCR" "BLASTX" 73.000000 10026 9922 162 196
Motif_homol "PS:PS00017" "Queryprosite" 13.800000 1881 1902 1 8
Motif_homol "PS:PS00077" "Queryprosite" 19.900000 35626 35638 1 5
Use of the left mouse button highlights sequences. With a second click (not double click) on a highlighted sequence you can fetch annotations from the local ACEDB database, from local databases using efetch, or over the World-Wide-Web using WWW-efetch. Be patient, access from a remote database may require some time. If this doesn't work, either efetch or the database itself is not installed for the external program. If it fails, the sequence will not be displayed, but the range which is annotated will represented by dashed-lines instead. If the sequence is not retrieved, the percent identification with the query sequence will not be calculated, but the original BLAST score reported will be diplayed.
In mode 1, blixem calls "efetch -q seqname", while in mode 2 it calls "efetch seqname". Your efetch script wrapper must therefore check for the -q option. If it is used, it should return the raw sequence on one line only. If it is not, it should return the annotation as raw text on multiple lines. Switch to the opposite strand by clicking on "Strand v^". "Goto..." can either be used by picking, to go to an absolute position, or as a pull-down menu with the right mouse button to go to the beginning or end of the query sequence currently in Blixem. By default, this is a 20 Kb region around the box that was used in ACEDB to call Blixem.
Other sequence retrieval tools are possible other than efetch (see discussion of efetch below). If you want to use your own in-house retrieval system, you can make a script wrapper that simulates efetch. This would be place in the $ACEDB/wscripts directory of the ACEDB database. The settings may be adjusted to select the method for retrieving sequences (acedb, efetch, www-efetch) from the Blixem settings menu (shown below).
The right mouse button invokes a menu with additional functions, such
as Dotter creates a dot plot from the alignment. The general mouse key
selections are shown below:
Options include being able to sort sequences (HSPs) by Score, %ID, position, or alphanumerically; this can be selected from the Settings box. When sorted by score, all proteins are listed with the highest-scoring first. Sorting by identity sorts all the proteins with the most identical first. Sorting by name lists all proteins alphabetically. Sort by position sorts all proteins with the most N-terminal first. Customization of the graphics display is possible using menu choices are for Background colour (1 of 32), Grid colour (1 of 32), Identical residues (1 of 32), and Conserved residues (1 of 32).
Toggle selections include (shown below):
Blixem's main menu (right mouse button anywhere):
A selected Dotter alignment will display a screen of all reading frames aligned with the query sequence (shown below):
And the actual Dotter dot plot is displayed (shown below):
Zoom in with parameter control:
Save Current plot:
Load features from file:
Change size of sliding window:
Draw BLAST HSPs (grey pixels):
Draw BLAST HSPs (red lines):
Draw BLAST HSPs (colour=f(score)):
Remove BLAST HSPs:
The Results can be filtered with the graphical tool once the comparison
The retrieval tool is called efetch, and comes with a set of tools to create the index files. Currently six sequence databases are supported: GenBank , EMBL, SWISS-PROT, PIR, Prosite and ProDom. ProDom is a comprehensive collection of protein domain families clustered and aligned by the method of Sonnhammer and Kahn (1994). The tools for creating the indices are based on Rodger Staden's programs, and the index system conforms to the standards proposed by the EMBL data library. Any database in a Fasta-like format is supported and new formats can easily be handled. All programs are written in C and the source code is freely available. Efetch can also be used as a stand-alone program to retrieve records on the command line. Retrieval is possible either by entryname or accession number. It currently supports some five output formats, and additional formats can very easily be added.
Efetch is a stand-alone program used by acedb to retrieve sequence data from external databases, such as SwissProt and PIR. This saves acedb from storing all this information internally while allowing acedb to retrieve any sequence entry on the fly, in order to display sequence annotation and sequence alignments from within acedb.
The efetch program provided with acedb only works on databases with indices that conform to the standard used on the EMBL CD-ROM. The only external sequence references stored in acedb are PIR and SwissProt. The indices for SwissProt can be taken directly from he EMBL CD-ROM, whereas the indices for PIR can be created with the Staden package.
SwissProt and EMBL Information about getting the EMBL CD-ROM is available
from Peter Stoehr, E-mail firstname.lastname@example.org. Copy the database
file swissprot/seq.dat and some indices from indices/swissprot/ to a separate
directory. You need these files:
Installing the efetch program
Set the environment variables SWDIR, PIRDIR and OTHERDIR to the directories where you keep SwissProt, PIR and Other. These directories MUST NOT BE THE SAME. The directory names must end with a slash (/).
setenv SWDIR SWDIRpath/
setenv PIRDIR PIRDIRpath/
setenv OTHERDIR OTHERDIRpath/
Then test efetch by e.g.: % efetch SW:HBA_HUMAN This should return the entire record of HBA_HUMAN. An alternative method which is more general and can be used for any database, is based on adding the prefix and directory to the environment variable EFETCH_PREFIX. For example, to link the prefix mydb to the directory /mydbdir, the syntax would be: setenv EFETCH_PREFIX "mydb:/mydbdir;/ By default, mydb is assumed to be in fasta format. If it is in flatfile format, this is set by using mydb(flat): instead of mydb: in the prefix definition. Any number of prefix:dir; entries can be added to EFETCH_PREFIX.
Now add the setenv commands above to a file that is run when you login, for instance .cshrc. If efetch is in the path and the environment variables set, efetch should now work from within acedb and you can see the sequence alignments in Blixem.
Sonnhammer, E. L. L and Durbin, R. (1994). A workbench for Large Scale Sequence Homology Analysis. Comput. Applic. Biosci, 10:301-307.
Sonnhammer, E.L.L. and Kahn, D. (1994) Modular structure of proteins
as inferred from analysis of homology. Protein Science, 3:4