Question

Read/Writer For Asn

1

Entering edit mode

12.7 years ago

Lee Katz ★ 3.1k

Hi everyone, is there a tool to make a submission template for tbl2asn, such that it would be easier to submit genomes? I have a list of authors' information which could hopefully be parsed and transferred to ASN format. For example:

author1="George Burdell|G.R.|Georgia Institute of Technology|Bio Lab|Atlanta|GA|United States|310 Ferst Dr NE|nobody@gatech.edu|1-404-385-5555|30332"

The definition of the Submission Template Format is difficult, but they give an example on their documentation page, at the very end:

ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/tbl2asn/DOCUMENTATION/tbl2asn.txt

*Edit* I am putting a bounty down for anyone who can find a framework for this, or if you can make a scalable script.

conversion ncbi genbank • 2.4k views

ADD COMMENT • link updated 12.7 years ago by Falstaff ▴ 30 • written 12.7 years ago by Lee Katz ★ 3.1k

1

Entering edit mode

Upon reflection, I am pretty sure this is what you mean. Have you considered using the NCBI C++ toolkit ? If you had that compiled, then I think it would be fairly straight-forward.

ADD REPLY • link 12.7 years ago by Falstaff ▴ 30

1

Entering edit mode

Well, I only know a C++ solution, so that isn't much help. I think NCBI should create a solution to this if they don't have one already. I would try writing to the help desk @ info@ncbi.nlm.nih.gov. Right now, I think they expect you to manually type these into Sequin and generate the ASN.1 that way ... but what if there are >20 authors on the paper, which is becoming more and more prevalent! Good luck!

ADD REPLY • link 12.7 years ago by Falstaff ▴ 30

0

Entering edit mode

It is not completely clear to me what you want: Do you want a template that already includes your author information ? I think that is what you mean. Perhaps you have a very long author list, and do not want to hand type these into a Submit-block ASN.1 format ?

ADD REPLY • link 12.7 years ago by Falstaff ▴ 30

0

Entering edit mode

I think I just want to convert a human-entered author list such as the pipe-delimited example above and turn it into the submission template file. I only understand Perl and PHP, and so I hope that this is covered by something other than the NCBI C++ toolkit, or that there is an already-compiled program.

ADD REPLY • link 12.7 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

Ok thanks. I'll put some kind of biostart-karma bounty on this. See if anyone wants to program it in bioperl or just perl.

ADD REPLY • link 12.7 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

Ok thanks. I think I know now that there is no scripting solution for this. I'll put a bounty for someone to make it in perl or bioperl.

ADD REPLY • link 12.7 years ago by Lee Katz ★ 3.1k

0

Entering edit mode

I don't understand the problem. Why can't you "just" split your string and generate the file "SUBMISSION TEMPLATE FORMAT" given as an example at the end ? (PS: I' too lazy to read the whole tbl2asn.txt file )

ADD REPLY • link 12.7 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

The template format is a nested data file which would at least involve recursion, and I'm not a trained computer scientist. I've always been bad at recursion, but I feel like this could be a piece of cake to someone who has been classically trained.

ADD REPLY • link 12.7 years ago by Lee Katz ★ 3.1k

score 1 · Answer 1 · 2011-08-02

This isn't an answer (yet), as it is an attempt to provide some background information and refine the input. The ASN.1 Specification for a Cit-sub looks like this:

Cit-sub ::= SEQUENCE {               -- citation for a direct submission
    authors Auth-list ,              -- not necessarily authors of the paper
    imp Imprint OPTIONAL ,           -- this only used to get date.. will go
    medium ENUMERATED {              -- medium of submission
        paper   (1) ,
        tape    (2) ,
        floppy  (3) ,
        email   (4) ,
        other   (255) } OPTIONAL ,
    date Date OPTIONAL ,              -- replaces imp, will become required
    descr VisibleString OPTIONAL }    -- description of changes for public view

Then the 'authors' field is this:

    -- Authorship Group
Auth-list ::= SEQUENCE {
        names CHOICE {
            std SEQUENCE OF Author ,          -- full citations
            ml SEQUENCE OF VisibleString ,    -- MEDLINE, semi-structured
            str SEQUENCE OF VisibleString } , -- free for all
        affil Affil OPTIONAL }                -- author affiliation

Thus, you are allowed multiple authors PER affiliation, and then multiple affiliations (Auth-list objects) per Cit-sub

So, would it be possible to re-organize the input data, with grouping by affiliations, like so:

Affiliation1="DATA | DATA | DATA | DATA"
AUHTOR1="INFO|INFO|INFO"
AUHTOR2="INFO|INFO|INFO"
AUHTOR3="INFO|INFO|INFO"
Affiliation2="DATA | DATA | DATA | DATA"
AUHTOR4="INFO|INFO|INFO"
AUHTOR5="INFO|INFO|INFO"
AUHTOR6="INFO|INFO|INFO"

Additionally, will you be providing input for who the 'contact' person is, perhaps with the first line having a CONTACT tag ? Or do you just want to hand edit that part.

Finally, would you reward your bounty for a linux or windows-compiled binary, so that the C++ library may be used, or must it be portable ?