How to relate prokka annotated prokaryotes identifiers to the actual identity ? (Name)
1
0
Entering edit mode
4 months ago
glendich • 0

Taking over projects from someone, I got a list of identifiers for expression data annotated with prokka as shown as below

PNJECBGM_02289  gnl|Prokka|PNJECBGM_42
PNJECBGM_02290  gnl|Prokka|PNJECBGM_42
BKKAOALG_00637  gnl|Prokka|BKKAOALG_9

and something that show the locus tag, CDS and products...

ID=BPFNMOJC_00555_gene;Name=hisZ_2;gene=hisZ_2;locus_tag=BPFNMOJC_00555

Does anyone has any suggestion on how can i find out the name of the prokaryotes that own this protein/genes identifiers? I tried Uniprot, Genbank, esearch from NCBI, PFAM database and i got nothing in return? I looked at the gbk files as well but it is absolutely not helpful:

VERSION:
KEYWORDS:    
.
SOURCE:      Genus species
ORGANISM : Genus species
                      Unclassified.
COMMENT     Annotated using prokka 1.14.6 from https://github.com/tseemann/prokka.
FEATURES             Location/Qualifiers
 source          1..24892
                     /organism="Genus species"
                     /mol_type="genomic DNA"
                     /strain="strain"
 gene            52..885
                     /locus_tag="BPFNMOJC_00001" 
mRNA            52..885
transcriptomics prokka MAG • 617 views
ADD COMMENT
0
Entering edit mode

I got a list of identifiers for expression data annotated with prokka

How can the expression data be annotated with prokka? You probably mean the result of the assembly was annotated with prokka and then expression analysis was done using that annotation? What types of annotation files do you have? You have this tagged with MAG so is this metatranscrptomic data?

ADD REPLY
0
Entering edit mode

yes you right, this is a metatranscriptomics data :) I have gbk files for each MAGs , but none of them tells what the name of the prokaryotes are

ADD REPLY
2
Entering edit mode
4 months ago
Mensur Dlakic ★ 27k

PNJECBGM_02289

These are random names given to found ORFs and later genes by prokka. That means whoever did the annotation didn't specify informative genus/species/strain names, so they were randomly picked.

In short, those names do not relate to anything meaningful in the databases.

ADD COMMENT
0
Entering edit mode

Prokka generally runs very fast and will give you the same results, I'd suggest rerunning it and checking out the TSV files especially, those have nice long names (if there's a hit in the database, but you will still get many 'Hypothetical Proteins'). Torsten himself suggested Bakta: https://github.com/oschwengers/bakta As Prokka hasn't been updated in a while, you might get more hits with biologically relevant names in the newer Bakta database.

ADD REPLY
0
Entering edit mode

I think the problem is that the OP has some existing results and can't figure them out, rather than having a file that needs to be annotated.

Just occurred to me: taking several predicted proteins and blasting them against the NR database might give an answer to the file origin.

ADD REPLY

Login before adding your answer.

Traffic: 1690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6