Best way for finding annotated counterparts of unknown transcript after BLAST
0
0
Entering edit mode
3.1 years ago
Mathias • 0

Hi all!

I am building a local pipeline in order to identify unknown transcripts. One part of this pipeline is identifying if the unknown sequences have a similar already-annotated counterpart. For this, I locally BLAST the transcripts and I am able to get the accession code, the coordinates, and strand of the hit in the other genome. With this, I expected to extract possible annotations found within the genome of the hit. I tried using efetch with the following call and delivers the next output:

 efetch -db nuccore -id "CP040608.1" -seq_start 17402 -seq_stop 16692 -strand 1 -format ft

>Feature gb|CP040608.1|
<1      647     gene
                        locus_tag       FBF02_00060
<1      647     CDS
                        product desulfoferrodoxin FeS4 iron-binding domain-containing protein
                        transl_table    11
                        protein_id      gb|QJE54075.1||gnl|PRJNA258022|FBF02_00060
                        inference       COORDINATES: similar to AA sequence:RefSeq:YP_002343484.1

Sadly I expect the region to be labelled only in the plus strand, but changing the strand to 2 delivers the same result...

Do you have any suggestion why this is happening? Do you have maybe another solution rather than efetch? I would expect to run ~10.000 of hits and efetch is quite slow and restrictive for this large amount of queries.

Thanks in advance!

blast efetch pipeline annotation • 541 views
ADD COMMENT

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6