blast using python
1
0
Entering edit mode
6.2 years ago

hello, I am beginner of python I have two dna sequences in fasta format. I want to blast these two sequences.can anyone suggest me some solution in python. I have used NcbiblastnCommandline. but it is not worked.

from Bio.Blast.Applications import NcbiblastpCommandline
from StringIO import StringIO
from Bio.Blast import NCBIXML
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO

# Create two sequence files
seq1 = SeqRecord(Seq("FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF"),
                   id="seq1")
seq2 = SeqRecord(Seq("FQTWEEFSRAEKLYLADPMKVRVVLRYRHVDGNLCIKVTDDLICLVYRTDQAQDVKKIEKF"),
                   id="seq2")
SeqIO.write(seq1, "seq1.fasta", "fasta")
SeqIO.write(seq2, "seq2.fasta", "fasta")

# Run BLAST and parse the output as XML
output = NcbiblastpCommandline(query="seq1.fasta", subject="seq2.fasta", outfmt=5)()[0]
blast_result_record = NCBIXML.read(StringIO(output))

# Print some information on the result
for alignment in blast_result_record.alignments:
    for hsp in alignment.hsps:
        print '****Alignment****'
        print 'sequence:', alignment.title
        print 'length:', alignment.length
        print 'e value:', hsp.expect
        print hsp.query
        print hsp.match
        print hsp.sbjct

I found that error file not specified.

sequence alignment • 12k views
ADD COMMENT
1
Entering edit mode

If there is no particular reason to run BLAST through Biopython with XML output, I typically will run the blast command if I have downstream python code, using suprocess(), with tab-delimited output which is easier to parse IMO after with python.

ADD REPLY
0
Entering edit mode

Have you installed local blastp? Looking at your code, I assume you are using Windows.

Biopython tries to run just "blastp" not "blastp.exe" thus we must specify exact path to the blastp.exe (I think).

This is the line I ran blastp with your code.

output = NcbiblastpCommandline(cmd="C:\\apps\\ncbi-blast-2.6.0+\\bin\\blastp.exe",query="seq1.fasta", subject="seq2.fasta", outfmt=5)()[0]

C:\apps\ncbi-blast-2.6.0+\bin\blastp.exe is the executable of blastp I downloaded from NCBI FTP server.

ADD REPLY
0
Entering edit mode

thank you very much.I will try this

ADD REPLY
0
Entering edit mode

It is not working. I am getting same error.

ADD REPLY
1
Entering edit mode

Could you give me complete error message and your path of blastp.exe?

ADD REPLY
0
Entering edit mode

yes, the code above works for me, so without the error message difficult to solve

ADD REPLY
1
Entering edit mode
6.2 years ago
User000 ▴ 690

Why complicating it so much? if you have only two sequences just use NCBI blast web site, otherwise if you will have many sequences this is not the best method, you can run either as a command line or as a bash script.

#!/usr/bin/env bash
BLAST="/path/to/ncbi-blast-2.2.26+/bin"
REFERENCE="/path/to/reference.fa"
QUERY="/path/to/inputfile.fa"
OUTDIR="/path/to/output/directory"

echo "format..."
$BLAST/makeblastdb -in $REFERENCE -dbtype nucl
$BLAST/blastn -evalue 0.00001 -outfmt 6  -query $QUERY -db $REFERENCE -out $OUTDIR/outputfile

p.s. I also don't think parsing the .xml output of blast is a good idea
ADD COMMENT
0
Entering edit mode

I also don't think parsing the .xml output of blast is a good idea

why ?

ADD REPLY
0
Entering edit mode

on my personal experience I found it easier working with the tab delimited outputs. For a newbie in bioiformatics parsing an xml output might not be so straightforward, also if one has many sequences an xml format becomes heavier than tab. (See http://www.polarmicrobes.org/parsing-blast-xml-output/).

ADD REPLY

Login before adding your answer.

Traffic: 2758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6