Biostar Beta. Not for public use.
Best Blast Hits from Tabular Outputs in Multiple files
0
Entering edit mode
21 months ago
trini1god • 0

Hello people! I have a tabular blast outputs in different files (many files) and i want to extract the best hit according to their bitscore and alignment length. I thought of using awk but I got confused on how the regular expression will be since the outputs are in different files and of different ranges. Then I decided to use a python script from bioman for Blast best hit and I am getting this error

Traceback (most recent call last):
File "BlastBesthit.py", line 17, in <module>
maxscore = bestHit.split()[11]
IndexError: list index out of range

Below is the code:

import string 
import sys
import re
import os

blastdir = "/mnt/chisom/chisom/multifasta_files/multi_fasta/Blast_output"
os.chdir(blastdir)
readblastfile = os.listdir(blastdir)

for blastres in readblastfile:
    #print blastres
    Hits = open(blastres, 'r')
    bestHit = Hits.read()
    print bestHit
    print len(bestHit)
    maxscore = bestHit.split()[11]
    query = bestHit.split()[0]
    #print query

    hits = Hits.readlines()


    for hit in hits:
            hitSplit = hit.split()
            if query is not hitSplit[0]:
                    print bestHit
                    query = hitSplit[0]
                    maxscore = 0
                    bestHit = hit
            elif maxscore < hitSplit[11]:
                    bestHit = hit
                    maxscore = hitSplit[11]
    Hits.close()

A content of one file called blastres_4.2.3.52 is

4.2.3.52    Manes.12G148700.1   32.27   595 366 10  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G149100.1   32.50   597 362 12  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G076600.1   32.23   574 343 9   61  628 178 1779    2e-86 285

And the result I want to get is (since it is the same bitscore but different length)

4.2.3.52    Manes.12G148700.1   32.27   595 366 10  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G149100.1   32.50   597 362 12  36  629 76  1752    2e-90 295

I need all the best hits extracted to be in one file. please any help, correction and suggestion will be . appreciated. Thanks

ADD COMMENTlink
0
Entering edit mode

I am unable to find the script that you have provided the link to but from looking at the code it looks like it's the blastdir issue as it is hard-coded in the script. Either you can change that location to where the blast results are on your computer or you could try https://github.com/alunem/bioman/blob/master/blast/bmn-BlastBestHit.py code instead, where you will have to provide the location to the blast directory as command line argument.

ADD REPLYlink
0
Entering edit mode

Hello Sej, I hard-coded the directory into the script because in the directory contains many files which i have to loop through to get the best hits.

ADD REPLYlink
0
Entering edit mode

Does the directory only contain blast files, and nothing else? Otherwise, you will have problems.

ADD REPLYlink
0
Entering edit mode

Yes it contains only blast result files

ADD REPLYlink
0
Entering edit mode

You could create a new directory with all BLAST results file and change the path in the script to that folder as blastdir.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1