Getting Untranslated Dna Subject String As Output From Tblastn Via Biopython
2
0
Entering edit mode
11.4 years ago
agatorano ▴ 50

when running alignment using tblastn via biopython, the subject string that is returned is the translated version of the DNA strand we blasted against. Is there a way to return the original DNA string, instead of the translated version?

biopython • 3.1k views
ADD COMMENT
1
Entering edit mode

Could you explain a little more in depth what problem you are experiencing? What were your inputs and what outputs did you expect from biopython?

ADD REPLY
5
Entering edit mode
11.4 years ago
bow ▴ 790

This is a limitation of the BLAST XML output itself: it doesn't keep the original sequence. Biopython only parses this output into a user-friendly data structure. Without any information regarding the original sequence in the BLAST XML file, the original sequence couldn't be returned.

You could reverse translate the given protein sequence. But given the codon redundancy, it may be impossible to figure out the original DNA sequence from the BLAST XML file alone.

ADD COMMENT
1
Entering edit mode

+1 . Going beyond the xml output, one option, depending on your set up, is to use Bio.SeqIO.to_dict() to create a (in memory) dictionary of the sequences that make up the BLAST database. Then you can use the subject id to retrieve the original sequence.

ADD REPLY
2
Entering edit mode

Unless you have a quite small database FASTA file, rather than an in memory index with Bio.SeqIO.to_dict(), probably Bio.SeqIO.index() or Bio.SeqIO.index_db() would be more sensible.

Or, and this is a good plan if don't have a FASTA file of the database, you could use blastdbcmd - although that isn't always as easy as it should be: http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html

ADD REPLY

Login before adding your answer.

Traffic: 1960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6