Best Blast Hits from Tabular Outputs in Multiple files
0
0
Entering edit mode
6.5 years ago
trini1god • 0

Hello people!

I have a tabular blast outputs in different files (many files) and i want to extract the best hit according to their bitscore and alignment length. I thought of using awk but I got confused on how the regular expression will be since the outputs are in different files and of different ranges. Then I decided to use a python script from bioman for Blast best hit and I am getting this error

Traceback (most recent call last):
File "BlastBesthit.py", line 17, in <module>
maxscore = bestHit.split()[11]
IndexError: list index out of range

Below is the code:

import string 
import sys
import re
import os

blastdir = "/mnt/chisom/chisom/multifasta_files/multi_fasta/Blast_output"
os.chdir(blastdir)
readblastfile = os.listdir(blastdir)

for blastres in readblastfile:
    #print blastres
    Hits = open(blastres, 'r')
    bestHit = Hits.read()
    print bestHit
    print len(bestHit)
    maxscore = bestHit.split()[11]
    query = bestHit.split()[0]
    #print query

    hits = Hits.readlines()


    for hit in hits:
            hitSplit = hit.split()
            if query is not hitSplit[0]:
                    print bestHit
                    query = hitSplit[0]
                    maxscore = 0
                    bestHit = hit
            elif maxscore < hitSplit[11]:
                    bestHit = hit
                    maxscore = hitSplit[11]
    Hits.close()

A content of one file called blastres_4.2.3.52 is

4.2.3.52    Manes.12G148700.1   32.27   595 366 10  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G149100.1   32.50   597 362 12  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G076600.1   32.23   574 343 9   61  628 178 1779    2e-86 285

And the result I want to get is (since it is the same bitscore but different length)

4.2.3.52    Manes.12G148700.1   32.27   595 366 10  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G149100.1   32.50   597 362 12  36  629 76  1752    2e-90 295

I need all the best hits extracted to be in one file. please any help, correction and suggestion will be appreciated. Thanks

python shell blast • 1.7k views
ADD COMMENT
0
Entering edit mode

I am unable to find the script that you have provided the link to but from looking at the code it looks like it's the blastdir issue as it is hard-coded in the script. Either you can change that location to where the blast results are on your computer or you could try https://github.com/alunem/bioman/blob/master/blast/bmn-BlastBestHit.py code instead, where you will have to provide the location to the blast directory as command line argument.

ADD REPLY
0
Entering edit mode

Hello Sej, I hard-coded the directory into the script because in the directory contains many files which i have to loop through to get the best hits.

ADD REPLY
0
Entering edit mode

Does the directory only contain blast files, and nothing else? Otherwise, you will have problems.

ADD REPLY
0
Entering edit mode

Yes it contains only blast result files

ADD REPLY
0
Entering edit mode

You could create a new directory with all BLAST results file and change the path in the script to that folder as blastdir.

ADD REPLY

Login before adding your answer.

Traffic: 1479 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6