Biostar Beta. Not for public use.
Best Blast Hits from Tabular Outputs in Multiple files
0
Entering edit mode
21 months ago
trini1god • 0

Hello people! I have a tabular blast outputs in different files (many files) and i want to extract the best hit according to their bitscore and alignment length. I thought of using awk but I got confused on how the regular expression will be since the outputs are in different files and of different ranges. Then I decided to use a python script from bioman for Blast best hit and I am getting this error

Traceback (most recent call last):
File "BlastBesthit.py", line 17, in <module>
maxscore = bestHit.split()[11]
IndexError: list index out of range


Below is the code:

import string
import sys
import re
import os

blastdir = "/mnt/chisom/chisom/multifasta_files/multi_fasta/Blast_output"
os.chdir(blastdir)

#print blastres
Hits = open(blastres, 'r')
print bestHit
print len(bestHit)
maxscore = bestHit.split()[11]
query = bestHit.split()[0]
#print query

for hit in hits:
hitSplit = hit.split()
if query is not hitSplit[0]:
print bestHit
query = hitSplit[0]
maxscore = 0
bestHit = hit
elif maxscore < hitSplit[11]:
bestHit = hit
maxscore = hitSplit[11]
Hits.close()


A content of one file called blastres_4.2.3.52 is

4.2.3.52    Manes.12G148700.1   32.27   595 366 10  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G149100.1   32.50   597 362 12  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G076600.1   32.23   574 343 9   61  628 178 1779    2e-86 285


And the result I want to get is (since it is the same bitscore but different length)

4.2.3.52    Manes.12G148700.1   32.27   595 366 10  36  629 76  1752    2e-90 295
4.2.3.52    Manes.12G149100.1   32.50   597 362 12  36  629 76  1752    2e-90 295


I need all the best hits extracted to be in one file. please any help, correction and suggestion will be . appreciated. Thanks

0
Entering edit mode

I am unable to find the script that you have provided the link to but from looking at the code it looks like it's the blastdir issue as it is hard-coded in the script. Either you can change that location to where the blast results are on your computer or you could try https://github.com/alunem/bioman/blob/master/blast/bmn-BlastBestHit.py code instead, where you will have to provide the location to the blast directory as command line argument.

0
Entering edit mode

Hello Sej, I hard-coded the directory into the script because in the directory contains many files which i have to loop through to get the best hits.

0
Entering edit mode

Does the directory only contain blast files, and nothing else? Otherwise, you will have problems.

0
Entering edit mode

Yes it contains only blast result files