Much needs to be done on this, but here is a start.

EMBL dump now gets information from a Method object, which controls what is dumped where. In fact, you can specify dump field information either in the Sequence object itself, or a method, or a method inherited from the method ..., i.e. you can have defaults that you override either in a specialised method or in the object itself.

You can now dump any sequence object for which DNA can be found, directly or indirectly (so you can dump links or subsequences).

You can also now dump from giface, including gifaceserver (see example below).

There is now code for dumping arbitrary features according to dump information in the feature method, plus possible #Feature_info data from the Feature lines. This also works on Homol lines now.

Known bugs/incompletenesses:

  1. I don't get arbitrary features from subsequences, and recurse on them. so in fact dumping links won't work. I realise.
  2. The complex_qualifier model/code is in a state of flux, so does not work just now.

Using EMBL_dump_info and associated methods

The order of processing is:
  1. Needs to be a sequence object. Needs to find DNA.
  2. Complains if no object under _Clone [clone]. This will be used in the context of Clone_left_end information (see below).
  3. The ID entry is derived from _Database "EMBL" [id] [ac]. If it exists then use [id]. Else use ID_template from the dump_info, replacing %s by the sequence object name.
  4. AC lines are calculated from _Database "EMBL" [id] [ac]. If that does not exist, look under _Ac_number [ac]. Else nothing.
  5. DE lines come from DE_format in dump_info. Again %s is substituted by the sequence name.
  6. KW lines come from _Keyword [keyword] entries.

Model Changes

Create a new # (subobject) model:
#EMBL_dump_info	EMBL_dump_method UNIQUE ?Method
		ID_template UNIQUE Text	
		ID_division UNIQUE Text
		DE_format UNIQUE Text  
		OS_line UNIQUE Text    
		OC_line Text           
		RL_submission Text     
		EMBL_reference ?Paper  
		CC_line Text           
		source_organism UNIQUE Text   
This information is made accessible by adding to the Method model:
	EMBL_dump_info #EMBL_dump_info
Add similarly to Sequence model:
	  DB_info	...
			EMBL_dump_info #EMBL_dump_info
The use of the shared subobject model makes things recursive. When looking for OS_line, for example, the first one that is found gets used, starting with information in the Sequence object, then in its EMBL_dump_method, then in its EMBL_dump_method...

Add to Map model:

	EMBL_chromosome UNIQUE Text
This determines how a Map object is transformed to a /chromosome="xx" line under the "source" feature key.

Example .ace file

Method worm_EMBL-dump
EMBL_dump_info ID_template "CE%s"
EMBL_dump_info ID_division INV
EMBL_dump_info DE_format "Caenorhabditis elegans cosmid %s"
EMBL_dump_info OS_line "Caenorhabditis elegans (nematode)"
EMBL_dump_info OC_line "Eukaryota; Animalia; Metazoa; Nematoda; Secernentea; Rhabditia;"
EMBL_dump_info OC_line "Rhabditida; Rhabditina; Rhabditoidea; Rhabditidae."
EMBL_dump_info RL_submission "Submitted (DD-MMM-YYYY) to the EMBL Data Library by:"
EMBL_dump_info RL_submission "Nematode Sequencing Project, Sanger Institute, Hinxton, Cambridge"
EMBL_dump_info RL_submission "CB10 1RQ, England and Department of Genetics, Washington"
EMBL_dump_info RL_submission "University, St. Louis, MO 63110, USA."
EMBL_dump_info RL_submission "E-mail: jes@sanger.ac.uk or rw@nematode.wustl.edu"
EMBL_dump_info EMBL_reference seq-paper-2
EMBL_dump_info CC_line "Current sequence finishing criteria for the C. elegans genome"
EMBL_dump_info CC_line "sequencing consortium are that all bases are either sequenced"
EMBL_dump_info CC_line "unambiguously on both strands, or on a single strand with both"
EMBL_dump_info CC_line "a dye primer and dye terminator reaction, from distinct"
EMBL_dump_info CC_line "subclones.  Exceptions are indicated by an explicit note.\n"
EMBL_dump_info CC_line "Coding sequences below are predicted from computer analysis,"
EMBL_dump_info CC_line "using predictions from Genefinder (P. Green, U. Washington),"
EMBL_dump_info CC_line "and other available information.\n"
EMBL_dump_info CC_line "IMPORTANT:  This sequence is NOT necessarily the entire insert"
EMBL_dump_info CC_line "of the specified clone.  It may be shorter because we only"
EMBL_dump_info CC_line "sequence overlapping sections once, or longer because we"
EMBL_dump_info CC_line "arrange for a small overlap between neighbouring submissions.\n"
EMBL_dump_info source_organism "Caenorhabditis elegans"

Paper seq-paper-2
Title "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans"
Journal Nature
Volume 368
Page 32 38
Year 1994
Author "Wilson R"
Author "Ainscough R"

Map Sequence-I
EMBL_chromosome I

Map Sequence-II
EMBL_chromosome II

Sequence AH6
EMBL_dump_info EMBL_dump_method worm_EMBL-dump
Note the "\n" at the end of comment lines to have them followed by blank lines.

Example giface script

giface < find sequence AH6
acedb> gif
acedb-gif> embl ah6
acedb-gif> quit
acedb> quit
EOF

Example Output

ID   CEAH6      standard; DNA; INV; 37801 BP.
XX
AC   Z48009;
XX
KW   Zinc finger; Transposon; Guanylate cyclase.
XX
OS   Caenorhabditis elegans (nematode)
OS   Eukaryota; Animalia; Metazoa; Nematoda; Secernentea; Rhabditia;
XX
RN   [1]
RP   1-37801
RA   Berks M.;
RT   ;
RL   Submitted (06-FEB-1995) to the EMBL Data Library by:
RL   Nematode Sequencing Project, Sanger Institute, Hinxton, Cambridge
RL   CB10 1RQ, England and Department of Genetics, Washington
RL   University, St. Louis, MO 63110, USA.
RL   E-mail: jes@sanger.ac.uk or rw@nematode.wustl.edu
XX
RN   [2]
RA   Wilson R., Ainscough R., Anderson K., Baynes C., Berks M.,
RA   Bonfield J., Burton J., Connell M., Copsey T., Cooper J.,
RA   Coulson A., Craxton M., Dear S., Du Z., Durbin R., Favello A.,
RA   Fulton L., Gardner A., Green P., Hawkins T., Hillier L., Jier M.,
RA   Johnston L., Jones M., Kershaw J., Kirsten J., Laister N.,
RA   Latreille P., Lightning J., Lloyd C., McMurray A., Mortimore B.,
RA   O'Callaghan M., Parsons J., Percy C., Rifken L., Roopra A.,
RA   Saunders D., Shownkeen R., Smaldon N., Smith A., Sonnhammer E.,
RA   Staden R., Sulston J., Thierry-Mieg J., Thomas K., Vaudin M.,
RA   Vaughan K., Waterston R., Watson A., Weinstock L.,
RA   Wilkinson-Sproat J., Wohldman P.;
RT   "2.2 Mb of contiguous nucleotide sequence from chromosome III of 
RT   C. elegans";
RL   Nature 368:32-38 (1994).
XX
CC   Current sequence finishing criteria for the C. elegans genome
CC   sequencing consortium are that all bases are either sequenced
CC   unambiguously on both strands, or on a single strand with both
CC   a dye primer and dye terminator reaction, from distinct
CC   subclones.  Exceptions are indicated by an explicit note.
CC   
CC   Coding sequences below are predicted from computer analysis,
CC   using predictions from Genefinder (P. Green, U. Washington),
CC   and other available information.
CC   
CC   IMPORTANT:  This sequence is NOT necessarily the entire insert
CC   of the specified clone.  It may be shorter because we only
CC   sequence overlapping sections once, or longer because we
CC   arrange for a small overlap between neighbouring submissions.
CC   
CC   The true left end of clone AH6 is at 1 in this sequence.
CC   The true right end of clone AH6 is at 3246 in
CC   sequence Z36752.
CC   The true left end of clone F35H8 is at 37686 in this sequence.
CC   The true right end of clone R134 is at 30025 in this sequence.
CC   The start of this sequence (1..101) overlaps with the end of 
CC   sequence Z48007.
CC   The end of this sequence (37686..37801) overlaps with the start of 
CC   sequence Z36752.
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..37801
FT                   /organism="Caenorhabditis elegans"
FT                   /clone="AH6"
FT                   /chromosome="II"
FT   CDS             complement(join(5054..5310,5380..5507,5551..5636,
FT                   5683..5904,5960..6180,6233..6308))
FT                   /product="AH6.2"
FT   CDS             join(6579..6638,6872..7051,7096..7416,7464..7595)
FT                   /product="AH6.3"
FT   CDS             complement(join(7727..7885,7935..8121,8174..8446,
FT                   8499..8644,8872..9102))
FT                   /product="AH6.4"
FT                   /gene="sra-1"
FT   CDS             join(11523..11741,11789..12303,12357..12681,
FT                   12731..12871,12923..13059,13109..13307)
FT                   /product="AH6.5"
FT                   /note="similar to zinc finger protein"
FT                   /note="cDNA EST yk38b2.3 comes from this gene"
FT                   /note="cDNA EST yk38b2.5 comes from this gene"
FT                   /note="cDNA EST yk45f12.5 comes from this gene"
FT   CDS             complement(join(29091..29243,29290..29476,29667..30316))
FT                   /product="AH6.8"
FT                   /gene="sra-4"
FT   CDS             complement(join(33869..34021,34183..34375,34425..34816))
FT                   /product="AH6.9"
FT                   /gene="sra-5"
FT   CDS             join(32117..32766,33253..33439,33487..33639)
FT                   /product="AH6.11"
FT                   /gene="sra-7"
FT   CDS             join(17948..18189,18299..18441,18532..18617)
FT                   /product="AH6.13"
FT   CDS             join(19742..20391,20445..20637,21068..21220)
FT                   /product="AH6.14"
FT                   /gene="sra-9"
FT   CDS             complement(join(36041..36193,36247..36433,36481..37130))
FT                   /product="AH6.10"
FT                   /gene="sra-6"
FT   CDS             join(16211..16860,16906..17092,17141..17293)
FT                   /product="AH6.6"
FT                   /gene="sra-2"
FT   CDS             join(25160..25809,26063..26249,26301..26453)
FT                   /product="AH6.12"
FT                   /gene="sra-8"
FT   CDS             complement(join(26752..26904,26982..27168,27605..28254))
FT                   /product="AH6.7"
FT                   /gene="sra-3"
FT   CDS             complement(21569..24025)
FT                   /product="AH6.15"
FT                   /pseudo
FT                   /note="probably a transposon"
FT   CDS             14448..15525
FT                   /product="AH6.16"
FT                   /pseudo
FT   CDS             complement(join(Z48007:14051..Z48007:14230,102..337,
FT                   397..500,544..951,1002..1179,1226..1526,1579..2235,
FT                   2288..2418,2483..2621,2672..2797,2851..2948,2999..3151,
FT                   3195..3424,3469..3711,3996..4126,4270..4368))
FT                   /product="AH6.1"
FT                   /note="similar to guanylate cyclase"
XX
SQ   Sequence  37801 BP;   12255 A; 6407 C; 6368 G; 12771 T; 0 other;
     ccatgagagc ttgatggatt tggaatccat ctatcgttgg ttactggtgg tgttgaccga
     ttactaatgc ttcttaactc ggttggttcc atttcaccaa atctgccgtg cacccaaaat
     gtttccataa ctccttttcc ttttataatt acttctcctc gggaactcgt ttcgtattga

Dumping Features and Homols

Here are the relevant sections of models
?Sequence ...
//	  Homol	  ?Method Float Int UNIQUE Int Int UNIQUE Int #Homol_info
		//  is target, Float is score, second pair of
		// ints are y1, y2
	  Feature ?Method Int Int UNIQUE Float UNIQUE Text #Feature_info
		// Float is score, Text is note

?Method ...
	EMBL_dump EMBL_feature UNIQUE Text	// require this
		  EMBL_threshold UNIQUE Float
			// apply to score unless overridden
		  EMBL_qualifier Text
	// simple non-varying qualifiers, the entire text starting with '/'
		  EMBL_complex_qualifier Text #Qualifier_arg
	// qualifier whose content varies according to the feature line
	// followed by alternative printf() format and #Qualifier_arg

#Qualifier_arg  UNIQUE  Score UNIQUE Text #Qualifier_arg
			Note UNIQUE Text #Qualifier_arg
			Target UNIQUE Text #Qualifier_arg
			Y1 UNIQUE Text #Qualifier_arg
			Y2 UNIQUE Text #Qualifier_arg
			Get_Text UNIQUE Text UNIQUE Text #Qualifier_arg
			Get_Int UNIQUE Text UNIQUE Text #Qualifier_arg
			Get_Float UNIQUE Text UNIQUE Text #Qualifier_arg
			Get_Key UNIQUE Text UNIQUE Text #Qualifier_arg
			Get_Tag UNIQUE Text UNIQUE Text #Qualifier_arg
			If_tag UNIQUE Text
			If_not_tag UNIQUE Text
				// 1st text is tag name in #XXX_info
	// recursion allows multiple arguments to be added, like + in Java

#Feature_info	EMBL_dump UNIQUE EMBL_dump_YES
				 EMBL_dump_NO
			// overrides for embl dump based on method
		EMBL_qualifier Text
			// additional to those in the method, includes '/'
//		 Text
			// can be used by EMBL_complex_qualifier in ?Method
		...

// #Homol_info can have all the same content as ?Feature_info