I am trying to run online NCBI BLAST in parallel using python multiprocessing package. While running the code. the following error has occurred:
Process Process-4:
Traceback (most recent call last):
File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 297, in _bootstrap
self.run()
File "C:\Users\muh_asif\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\muh_asif\PycharmProjects\parellel\index.py", line 16, in f
result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\NCBIWWW.py", line 141, in qblast
rid, rtoe = _parse_qblast_ref_page(handle)
File "C:\Users\muh_asif\PycharmProjects\parellel\venv\lib\site-packages\Bio\Blast\NCBIWWW.py", line 253, in _parse_qblast_ref_page
raise ValueError("Error message from NCBI: %s" % msg)
ValueError: Error message from NCBI: Cannot accept request, error code: -1
This error occurred for many processes, for example for process # 5 and 6 as well.
Apparently NCBI did not accept values for some processes. Is there a way to fix this error? Am I allowed to submit 3 or 4 queries to NCBI at the same time?
Secondly, the processes are only created for the first element of taxa_id_list list not for the second element. Is there a better way to run BLAST in parallel using multiprocessing package? I am new to multiprocessing and I am trying to make BLAST run faster. Here is the link for input file (input_file) and the code is:
from multiprocessing import current_process
from Bio.Blast import NCBIXML
from Bio.Blast import NCBIWWW
from Bio import SeqIO
def f(record, j, id):
record = str(record)
print(record)
j = str(j)
print(j)
proc_name = current_process().name
print(f"Process name: {proc_name}")
result_handle = NCBIWWW.qblast("blastn", "nt", record.format("fasta"),entrez_query=j, hitlist_size=1)
blast_records = NCBIXML.parse(result_handle)
for blast_record in blast_records:
for alignment in blast_record.alignments:
print(f"accession num: {alignment.accession} for ID: {id}")
if __name__ == '__main__':
from multiprocessing import Process
fasta_file_name = 'dummy_fasta.fasta'
my_fasta = SeqIO.parse(fasta_file_name, "fasta")
#to restrict blast to a specific specie.
taxa_id_list = ["txid9606 [ORGN]", "txid39442 [ORGN]"]
processes = []
for j in taxa_id_list:
for k in my_fasta: # read all sequences from fasta file
seq = k.seq
id = k.id
process = Process(target=f, args=(seq, j, id))
processes.append(process)
process.start()
for l in processes:
l.join()
thank you.
NCBI is very likely to rate limit you, try just sending 2 or 3 requests at most at once.
Blast itself if multi-threaded so each job you start can use more than one thread. I am not sure why you want to use multi-processing to submit remote blast jobs. Please be considerate of this public resource.
Devon Ryan and genomax thank you for your replies. I was not aware about the NCBI restrictions. now, I will submit max one or two requests to BLAST at once.