I want gff3 files for certain Vertebrates, but I could only find the corresponding genbank files in Ensembl. So I decided to download them and convert them to gff3, using the bioperl script bp_genbank2gff3. Even though it produces a gff3 file, I get an error and I'm not sure if the resulting file is okay to use... More specifically, the most frequent message I get is a "Possible gene unflattening error".
I have used bp_genbank2gff3 a few times so far, but it's the first time I'm seeing that kind of error. What I find weird is that, even though it says it encountered an error, it produces a gff file...
If you want to reproduce the error, get the following genbank file
wget ftp://ftp.ensembl.org/pub/release-73/genbank/gallus_gallus/Gallus_gallus.Galgal4.73.chromosome.10.dat.gz
and try to convert it using
bp_genkbank2gff3 Gallus_gallus.Galgal4.73.chromosome.10.dat.gz
For your information, I have also tried first decompressing the gzipped file and then running bp_genbank2gff3, but I still get the same error.
Has anyone else seen this error? If so, how do you fix it?
Thanks, Panos
Thanks for the advice Emily! Why is gtf2gff3 outputting an unsorted gff file, though? Does it, at least, keep the features of a gene (like CDSs, exons, etc) right after each gene?
I'm afraid I'm not totally familiar with the script. The README only shows the example of a single gene so it's not clear. You could test it on two genes and see what it does with them.