Python SearchIO: Extracting information from QueryResults?
0
0
Entering edit mode
5.0 years ago

I performed a BLAST search of a fasta file with multiple sequences using python. What I want to do now is to extract information and put it in a pandas dataframe. I want the query ID, the hit ID, and the accession number of the hit. Here's what I've done so far:

fasta_string = open("list.fasta").read()
result_handle = NCBIWWW.qblast("blastx", sequence = fasta_string, database = "refseq_protein",
                               entrez_query = 'txid9606[ORGN]')

with open("my_blast.xml", 'w') as out_handle:
    out_handle.write(result_handle.read())
    result_handle.close()

result_handle = open("my_blast.xml")

blast_records = NCBIXML.parse(result_handle)

qresults = SearchIO.parse('my_blast.xml', 'blast-xml')

search_dict = SearchIO.to_dict(qresults)
query_id = []
hit_list = []

tsv_output = pd.DataFrame(query_id) #Initialize pandas dataframe

for key, value in search_dict.items():
    query_id.append(key)
    hit_list.append(value)

I already added the Query ID to the pandas dataframe, now I'm looking to find some way to extract the ID of every result in hit_list, which is a list of QueryResults. I've looked through the documentation (https://biopython.org/DIST/docs/api/Bio.SearchIO._model.query.QueryResult-class.html), but I don't see any way to extract the hit ID or the hit accession number. Does anyone know how I could do this?

Thank you

python blast xml • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6