The Sequence Window V: Genefinder

Genefinder is a standalone program that was originally developed by Phil Green at Washington University - St. Louis, Missouri. However, the ACEDB developers have integrated it into the sequence display. Genefinder uses specially developed tables to predict coding regions and intron/exon splice sites. The Arabidopsis tables were developed by Stefan Klosterman at the Max Planck Institute for Biochemistry, and Colin Wilson at Washington University. The display becomes quite complex once the Genefinder features are shown. As shown in the figure below the region of the display which originally only showed the solid blue bars indicating protein homology, now contains a variety of grey, yellow and green boxes, and at the right is a nest of "L" shaped lines.

This example shows sequence ATENGE (GenBank accession X58107), chosen primarily because it has a large number of introns. To reproduce this display, first select sequence ATENGE in the Main Menu box. Once the sequence window appears, move the mouse cursor to the "Genefinder..." box and use the rightmost mouse button to pull down the popup menu and drag the cursor to the "Genefinder Features" item as shown in the figure. After releasing the mouse button the complex set of features is displayed. To find the putative coding sequence (the blue highlighted set of boxes at the right), once more open the "Genefinder..." popup window, but this time drag the cursor down to "Autofind gene". The display now shows a new coding region figure and it is labeled as "temp_gene" in the text area to the right.

In order to sort out the different features click on them on at a time and read the blue information bar above the "Genefinder..." button. You will find that grayboxes are putative coding regions, the small yellow boxes are start (ATG) codons. Stop codons in all three reading frames are designated by the series of thin horizontal black lines and define the limits of open reading frames (ORFs). The thin blue boxes next to the graycoding regions indicate that coding region was included in the gene sequence. Potential intron and exon boarders are marked by the L shaped lines at the far right. The splice sites that Genefinder scores as being used are highlighted in light green, The red, blue and black colors represent sites in different reading frames and sizes indicate the relative score of the splice site.

In this instance (and most that I checked) Genefinder predicts a coding sequence that very closely matches the data from GenBank. You can see that the only major difference between the actual coding sequence and the predicted sequence is and extra coding segment from 2506 bp to 2514 bp. If you have "write access" to the database, you can examine your own DNA sequence for coding regions. ("Write Access" and "Read .ace files" are items on the The Main Window Pop-Up Menu.) However, to read in the sequence you must use a special format for the DNA sequence file. An example with two sequences is given below.

DNA : "A_TEST"
   atcaaaagaaatagcactaaaggctcggaggaagcctgatgaaacatgga
   agattgtgctctattttcttctgacaatttttacataagtaaaacgcatt
   tgtttactatttttttcatataaaacatgaaaaacttatatttgaattaa

DNA : "A_TEST2"
tcgaaattaaaattattaacagaaatatctaagtttatatgaacctttta
acaaaaaaaaaagtttataagaacataaaaatcataa

For each entry there are four major parts; the class name (DNA), the object name (A_TEST and A_TEST2), the actual sequence, and a final blank line that indicates the end of the entry. The nucleotide sequence can have an arbitrary number of characters per line, however no line numbers or blank lines are permitted between characters. Leading spaces are okay. Once the sequence is successfully read in, you may run Genefinder or any of the other analysis programs on it.

Next Page

Back to the Table of Contents