EMBL dump now gets information from a Method object, which controls what is dumped where. In fact, you can specify dump field information either in the Sequence object itself, or a method, or a method inherited from the method ..., i.e. you can have defaults that you override either in a specialised method or in the object itself.
You can now dump any sequence object for which DNA can be found, directly or indirectly (so you can dump links or subsequences).
You can also now dump from giface, including gifaceserver (see example below).
There is now code for dumping arbitrary features according to dump information in the feature method, plus possible #Feature_info data from the Feature lines. This also works on Homol lines now.
- I don't get arbitrary features from subsequences, and recurse on them. so in fact dumping links won't work. I realise.
- The complex_qualifier model/code is in a state of flux, so does not work just now.
Using EMBL_dump_info and associated methodsThe order of processing is:
- Needs to be a sequence object. Needs to find DNA.
- Complains if no object under _Clone [clone]. This will be used in the context of Clone_left_end information (see below).
- The ID entry is derived from _Database "EMBL" [id] [ac]. If it exists then use [id]. Else use ID_template from the dump_info, replacing %s by the sequence object name.
- AC lines are calculated from _Database "EMBL" [id] [ac]. If that does not exist, look under _Ac_number [ac]. Else nothing.
- DE lines come from DE_format in dump_info. Again %s is substituted by the sequence name.
- KW lines come from _Keyword [keyword] entries.
Model ChangesCreate a new # (subobject) model:
#EMBL_dump_info EMBL_dump_method UNIQUE ?Method ID_template UNIQUE Text ID_division UNIQUE Text DE_format UNIQUE Text OS_line UNIQUE Text OC_line Text RL_submission Text EMBL_reference ?Paper CC_line Text source_organism UNIQUE TextThis information is made accessible by adding to the Method model:
EMBL_dump_info #EMBL_dump_infoAdd similarly to Sequence model:
DB_info ... EMBL_dump_info #EMBL_dump_infoThe use of the shared subobject model makes things recursive. When looking for OS_line, for example, the first one that is found gets used, starting with information in the Sequence object, then in its EMBL_dump_method, then in its EMBL_dump_method...
Add to Map model:
EMBL_chromosome UNIQUE TextThis determines how a Map object is transformed to a /chromosome="xx" line under the "source" feature key.
Example .ace file
Method worm_EMBL-dump EMBL_dump_info ID_template "CE%s" EMBL_dump_info ID_division INV EMBL_dump_info DE_format "Caenorhabditis elegans cosmid %s" EMBL_dump_info OS_line "Caenorhabditis elegans (nematode)" EMBL_dump_info OC_line "Eukaryota; Animalia; Metazoa; Nematoda; Secernentea; Rhabditia;" EMBL_dump_info OC_line "Rhabditida; Rhabditina; Rhabditoidea; Rhabditidae." EMBL_dump_info RL_submission "Submitted (DD-MMM-YYYY) to the EMBL Data Library by:" EMBL_dump_info RL_submission "Nematode Sequencing Project, Sanger Institute, Hinxton, Cambridge" EMBL_dump_info RL_submission "CB10 1RQ, England and Department of Genetics, Washington" EMBL_dump_info RL_submission "University, St. Louis, MO 63110, USA." EMBL_dump_info RL_submission "E-mail: firstname.lastname@example.org or email@example.com" EMBL_dump_info EMBL_reference seq-paper-2 EMBL_dump_info CC_line "Current sequence finishing criteria for the C. elegans genome" EMBL_dump_info CC_line "sequencing consortium are that all bases are either sequenced" EMBL_dump_info CC_line "unambiguously on both strands, or on a single strand with both" EMBL_dump_info CC_line "a dye primer and dye terminator reaction, from distinct" EMBL_dump_info CC_line "subclones. Exceptions are indicated by an explicit note.\n" EMBL_dump_info CC_line "Coding sequences below are predicted from computer analysis," EMBL_dump_info CC_line "using predictions from Genefinder (P. Green, U. Washington)," EMBL_dump_info CC_line "and other available information.\n" EMBL_dump_info CC_line "IMPORTANT: This sequence is NOT necessarily the entire insert" EMBL_dump_info CC_line "of the specified clone. It may be shorter because we only" EMBL_dump_info CC_line "sequence overlapping sections once, or longer because we" EMBL_dump_info CC_line "arrange for a small overlap between neighbouring submissions.\n" EMBL_dump_info source_organism "Caenorhabditis elegans" Paper seq-paper-2 Title "2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans" Journal Nature Volume 368 Page 32 38 Year 1994 Author "Wilson R" Author "Ainscough R" Map Sequence-I EMBL_chromosome I Map Sequence-II EMBL_chromosome II Sequence AH6 EMBL_dump_info EMBL_dump_method worm_EMBL-dumpNote the "\n" at the end of comment lines to have them followed by blank lines.
Example giface script
find sequence AH6 acedb> gif acedb-gif> embl ah6 acedb-gif> quit acedb> quit EOF
ID CEAH6 standard; DNA; INV; 37801 BP. XX AC Z48009; XX KW Zinc finger; Transposon; Guanylate cyclase. XX OS Caenorhabditis elegans (nematode) OS Eukaryota; Animalia; Metazoa; Nematoda; Secernentea; Rhabditia; XX RN  RP 1-37801 RA Berks M.; RT ; RL Submitted (06-FEB-1995) to the EMBL Data Library by: RL Nematode Sequencing Project, Sanger Institute, Hinxton, Cambridge RL CB10 1RQ, England and Department of Genetics, Washington RL University, St. Louis, MO 63110, USA. RL E-mail: firstname.lastname@example.org or email@example.com XX RN  RA Wilson R., Ainscough R., Anderson K., Baynes C., Berks M., RA Bonfield J., Burton J., Connell M., Copsey T., Cooper J., RA Coulson A., Craxton M., Dear S., Du Z., Durbin R., Favello A., RA Fulton L., Gardner A., Green P., Hawkins T., Hillier L., Jier M., RA Johnston L., Jones M., Kershaw J., Kirsten J., Laister N., RA Latreille P., Lightning J., Lloyd C., McMurray A., Mortimore B., RA O'Callaghan M., Parsons J., Percy C., Rifken L., Roopra A., RA Saunders D., Shownkeen R., Smaldon N., Smith A., Sonnhammer E., RA Staden R., Sulston J., Thierry-Mieg J., Thomas K., Vaudin M., RA Vaughan K., Waterston R., Watson A., Weinstock L., RA Wilkinson-Sproat J., Wohldman P.; RT "2.2 Mb of contiguous nucleotide sequence from chromosome III of RT C. elegans"; RL Nature 368:32-38 (1994). XX CC Current sequence finishing criteria for the C. elegans genome CC sequencing consortium are that all bases are either sequenced CC unambiguously on both strands, or on a single strand with both CC a dye primer and dye terminator reaction, from distinct CC subclones. Exceptions are indicated by an explicit note. CC CC Coding sequences below are predicted from computer analysis, CC using predictions from Genefinder (P. Green, U. Washington), CC and other available information. CC CC IMPORTANT: This sequence is NOT necessarily the entire insert CC of the specified clone. It may be shorter because we only CC sequence overlapping sections once, or longer because we CC arrange for a small overlap between neighbouring submissions. CC CC The true left end of clone AH6 is at 1 in this sequence. CC The true right end of clone AH6 is at 3246 in CC sequence Z36752. CC The true left end of clone F35H8 is at 37686 in this sequence. CC The true right end of clone R134 is at 30025 in this sequence. CC The start of this sequence (1..101) overlaps with the end of CC sequence Z48007. CC The end of this sequence (37686..37801) overlaps with the start of CC sequence Z36752. XX FH Key Location/Qualifiers FH FT source 1..37801 FT /organism="Caenorhabditis elegans" FT /clone="AH6" FT /chromosome="II" FT CDS complement(join(5054..5310,5380..5507,5551..5636, FT 5683..5904,5960..6180,6233..6308)) FT /product="AH6.2" FT CDS join(6579..6638,6872..7051,7096..7416,7464..7595) FT /product="AH6.3" FT CDS complement(join(7727..7885,7935..8121,8174..8446, FT 8499..8644,8872..9102)) FT /product="AH6.4" FT /gene="sra-1" FT CDS join(11523..11741,11789..12303,12357..12681, FT 12731..12871,12923..13059,13109..13307) FT /product="AH6.5" FT /note="similar to zinc finger protein" FT /note="cDNA EST yk38b2.3 comes from this gene" FT /note="cDNA EST yk38b2.5 comes from this gene" FT /note="cDNA EST yk45f12.5 comes from this gene" FT CDS complement(join(29091..29243,29290..29476,29667..30316)) FT /product="AH6.8" FT /gene="sra-4" FT CDS complement(join(33869..34021,34183..34375,34425..34816)) FT /product="AH6.9" FT /gene="sra-5" FT CDS join(32117..32766,33253..33439,33487..33639) FT /product="AH6.11" FT /gene="sra-7" FT CDS join(17948..18189,18299..18441,18532..18617) FT /product="AH6.13" FT CDS join(19742..20391,20445..20637,21068..21220) FT /product="AH6.14" FT /gene="sra-9" FT CDS complement(join(36041..36193,36247..36433,36481..37130)) FT /product="AH6.10" FT /gene="sra-6" FT CDS join(16211..16860,16906..17092,17141..17293) FT /product="AH6.6" FT /gene="sra-2" FT CDS join(25160..25809,26063..26249,26301..26453) FT /product="AH6.12" FT /gene="sra-8" FT CDS complement(join(26752..26904,26982..27168,27605..28254)) FT /product="AH6.7" FT /gene="sra-3" FT CDS complement(21569..24025) FT /product="AH6.15" FT /pseudo FT /note="probably a transposon" FT CDS 14448..15525 FT /product="AH6.16" FT /pseudo FT CDS complement(join(Z48007:14051..Z48007:14230,102..337, FT 397..500,544..951,1002..1179,1226..1526,1579..2235, FT 2288..2418,2483..2621,2672..2797,2851..2948,2999..3151, FT 3195..3424,3469..3711,3996..4126,4270..4368)) FT /product="AH6.1" FT /note="similar to guanylate cyclase" XX SQ Sequence 37801 BP; 12255 A; 6407 C; 6368 G; 12771 T; 0 other; ccatgagagc ttgatggatt tggaatccat ctatcgttgg ttactggtgg tgttgaccga ttactaatgc ttcttaactc ggttggttcc atttcaccaa atctgccgtg cacccaaaat gtttccataa ctccttttcc ttttataatt acttctcctc gggaactcgt ttcgtattga
Dumping Features and HomolsHere are the relevant sections of models
?Sequence ... // Homol