Genbank Flat File (.Gb) Proper Usage
1
1
Entering edit mode
10.7 years ago
mobiusklein ▴ 180

I'm attempting to convert my collection of scattered annotations into a unified GenBank Flat File. I've been looking at how different programs interact with the format, ranging from only accepting a set of the feature types, while others arbitrarily shoehorn the data into a feature type, and still others simply use the feature type as a sort of analog XML for loading their annotations in and out.

Is there a "Right" way to use the GB format?

I've seen the GFF format, but the GFF3 specification separates annotation data from sequence data. This is really good for big, abstracted data models, but isn't what I'm looking for at the moment because I still need to be able to convey that linkage to myself when I read my own data. Am I wrong to avoid GFF for this reason? What are other file formats I should look at?

Thank you

genbank • 6.9k views
ADD COMMENT
1
Entering edit mode
10.7 years ago
Peter 6.0k

GenBank or EMBL format work well for sequence annotation, and would be a good choice if you're thinking about submitting your annotated genome to the NCBI/EMBL/DDBJ - just follow the standard rather than deviating too far. For example, don't make up your own feature types! e.g. see http://www.insdc.org/files/feature_table.html

GFF3 does allow you to include the sequences too at the end of the file in a FASTA section, however it is commonly held as two files (GFF3 and FASTA). This makes sense if your annotation goes though several revisions while the sequence doesn't change. See http://www.sequenceontology.org/gff3.shtml

(A related question is what tools are recommended for working with these file format, e.g. graphical editors, parsers & writers, etc)

ADD COMMENT
0
Entering edit mode

Thank you, these answers really help. It should figure that the GFF3 feature was in the appendix.

ADD REPLY

Login before adding your answer.

Traffic: 2457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6