Prokka bacteria genome annotation
1
0
Entering edit mode
5.7 years ago
agata88 ▴ 870

Hi all!

I was annotating bacteria genome with prokka. At the end It gave me a results, which are not very understood for me. Maybe somebody more familiar with this program will help?

I have multiple contigs assigned to the same annotation. I run this command:

./prokka --outdir contigs_prokka --kingdom Bacteria --genus X --proteins uniprot_bacteria.fasta --usegenus --evalue 0.01 --rfam --cpu 8 --norrna contigs.fasta &

As a result I have tsv file with annotation including list of contigs and its annotation. For some of results I see that multiple contigs are assigned to the same annotation. For example:

contig1 CDS 1965                Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1
contig2   CDS   918             Zinc-transporting ATPase OX=224308 GN=zosA PE=1 SV=1

I am not sure how to interprate this:

  • whether it's unconnected contigs?
  • whether one sequence presents gene and the rest are pseudogenes?

  • can I take one - the longest - for final annotation and ignore rest, or annotate as potential pseudogenes?

Many thanks for any suggestions. Agata

prokka • 3.6k views
ADD COMMENT
0
Entering edit mode

Both could be real and just happen to be Zinc-t ATPases. Did you check for sequence redundancy in your contigs before running prokka. e.g. contig2 could be entirely similar to contig1 (and contained within it).

ADD REPLY
0
Entering edit mode

Yes, I used CD-HIT, it resulted in 10905 clusters from 10942 contigs.

This is not a single case, most records are multiplied.

ADD REPLY
0
Entering edit mode
5.7 years ago
agata88 ▴ 870

Hi all!

I have a solution for my question. So, it toured out that my sample is contaminated, that is why I had such huge amount of contigs. After filtering annotation went well. Hope that will help in the future similar dilemmas.

Btw I've filtered contigs by blastn and species specific nt database.

Best,

Agata

ADD COMMENT
1
Entering edit mode

Hi Agata,

Kindly send me the running command line of filtering contigs with blastnn?

ADD REPLY
0
Entering edit mode

You may look at this as a solution but having contaminated data going into an assembly is not a good thing. If you choose to submit this assembly to NCBI you may throw someone else off if they use this data for genome comparisons.

ADD REPLY
0
Entering edit mode

That is true, I am aware of that. I am going to submit only true data. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2656 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6