Hi, I am trying to write a script to parse and summarize the ncbi web blast output (14 columns). I am using biopython searchIO on the tab delimited output file which is commented by default. I get the error ValueError: Required query and/or hit ID field not found.
I tried to remove the further summarizing openpyxl part of the script and simply run the barebones with searchIO. But the error persists.
file = "out.txt"
blast_generator = SearchIO.parse(file, 'blast-tab', comments=True)
for blast_qresult in blast_generator:
print blast_qresult
for k,blast_hit in enumerate(blast_qresult):
print k
print blast_hit
query = blast_qresult.id
print query
I then tried to repeat the script by adding the column headers exactly and verbatim as present in the blast report.
custom_fields = 'query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score'
blast_generator = SearchIO.parse(file, 'blast-tab', fields=custom_fields, comments=True)
I still get the same error.
Can you please advice me where I'm going wrong?
In continuation to my post, i'm adding a snippet of my blast report file that I'm trying to parse.