Staffan's curator tools

The file staffans_tools.tgz is a tar-gzipped archive of a number of different perl scripts that may be useful to others. However, they come with absolutely no promises about their usability, or correctness.

All of the scripts were written for use in the curation and management of MycDB and most probably all has built-in assumptions about how the models look, and also some hidden design decisions. See these scripts as something to modify for your own needs.

The following is a description of the individual pieces. Some of them have more elaborate descriptions in the archive (currently embl2ace.pl and efetch, but more may follow). The first contains utility routines, and some of the others do use this, in which case you have to make sure that perl can find it:

util5.pl
Some utility subs such as standardized error messages, date and time conversions, stringbreaking in different ways, and some mathematical stuff.

acediffplus [requires util5.pl, and also fasta and diff]
A wrapper for acediff, that tries to take care of some of the special things you may want to do, such as getting the Authors right in Paper objects, do a fasta comparison of DNA sequences, compare LongTexts without tripping on whitespace differences.

Does need some work, because it has been around since before the birth of perl5, and also needs to know about Peptides.

blastn2ace
An extremely old script for parsing blastn output into .ace format. Probably does not work on your blast results without a lot of tweaking!

efetch [requires some perl modules]
A replacement for the efetch that comes with acedb and blixem. Gets sequences from a server on the net, via http.

[more info in the archive]

embl2ace.pl [requires util5.pl]
A script that parses embl formatted sequence data into ace format.

[more info in the archive, and the script itself has a lot of comments]

exp2ace.pl
Very simplistic script that parses Staden experiment files into some sort of ace format.

medline2ace.pl
Parses Medline records into .ace format. Unfortunately MedLine format does'nt always mean the same, so this may need tweaking.

refuniq2.pl [requires one tablemaker def file] and refuniq3.pl
These two can be used to weed out duplicates of papers from a database. refuniq2.pl explicitly looks for Papers that have the same Journal, Volume, Year and Pages, and prints -R statements for those that have the same; refuniq3.pl tries to match up unpublished and/or broken Paper objects with each other, by comparing titles etc.
tace.pl
Some simple routines to access a database with tace, and get data out or put it in.
If there are any problems, you can contact me (Staffan Bergh) at staffan.bergh@eu.pnu.com
Last edited 1997-08-06 /staffan