ACEDB Version 4_9

User Guide To Display of Gapped Alignments

Originally written by
Simon Kelley <srk@sanger.ac.uk>, May 2001

Gapped Alignments

Acedb now supports the storage and display of gap information with alignments.

The Homol tag in ?Sequence and ?Protein now has a model which looks like this:

Homol DNA_homol ?Sequence XREF DNA_homol ?Method Float Int Int Int Int #Homol_info
      Pep_homol ?Protein XREF DNA_homol ?Method Float Int Int Int Int  #Homol_info
      Motif_homol ?Motif XREF DNA_homol ?Method Float Int Int Int Int #Homol_info

and the Homol-info class looks like this:

#Homol_info Segs #Match_seg     // old way to give gapped alignment -  used in pephomolcol for Belvu call
            Align Int UNIQUE Int UNIQUE Int     // correct way to give gapped alignments for FMAP
             // for each ungapped block, self_start target_start [length]
             // if no length then until next block (so no double gap) 
             // if no Align assume ungapped
            AlignDNAPep Int UNIQUE Int UNIQUE Int
            AlignPepDNA Int UNIQUE Int UNIQUE Int
             // These two tags are analogous to Align, but scale length
	     // for the case of a dna alignment to peptide or vice-versa.

This system is backwards compatible since if the Homol_info data is not present, the Homologies are interpreted as before. (For reference, the Float parameter is a score and the four Ints are start and end co-ordinates in this sequence followed by start and end coordinates in the homologous sequence.

The gap information is stored as blocks of ungapped alignment. This is probably best illustrated by example:

Homol DNA_homol somesequence somemethod 100 2732 2809 29233 29310 Align 2732 29233
                                                                  Align 2742 29253
                                                                  Align 2752 29273
                                                                  Align 2762 29276

Gives and alignment which looks like this:

            2732      2741      2742      2751      2752     2762                                    2809
self     ----|---------|.........|---------|.........|---------|------------------~~-------------------|-----

target       |-------------------|-------------------|--|......|------------------~~-------------------|
           29233               29253              29273 29275 2976                                  29310

The third (length) integer is only required when the end of a gap in one sequence coincides with the start of a gap in the other:

Homol DNA_homol somesequence somemethod 100 2732 2809 29233 29310 Align 2732 29233 10
                                                                  Align 2752 29253

            2732      2741      2742           2752                                         
self     ----|---------|.........|--------------|-------

target       |-------------------|..............|----------

           29233               29252           29253 


The gaps are not displayed in the fMap, but they are displayed in blixem.

Gaps in the query sequence (ie where there are bases in the target with corresponding bases in the query) are shown by omitting the un-matched bases and drawing a vertical red bar. Gaps in the target sequence are shown with dots.

Peptide homolgies

For homologies of DNA versus Peptide, the two sets of coordinates need to be scaled, since a block length of x in amino acid residues corresponds to 3x in DNA bases. To tell acedb when and how to scale, use the AlignDNAPep and AlignPepDNA tags. For instance in a DNA object, to record a homology to a peptide, do:

Homol Pep_homol  somepeptide somemethod 200 2777 2699 100 133 AlignDNAPep 2777 100
                                                                          2753 104
                                                                          2738 120

ACEDB Version 4_9

Simon Kelley <srk@sanger.ac.uk>
Last modified: Wed May 16 10:29:36 BST 2001