Blast locally with multiple files in a directory as queries
1
0
Entering edit mode
4.9 years ago
fec2 ▴ 50

Hi all,

I need to run blast locally on multiple fasta files contain in a directory. So, refer to Script to run blast locally with multiple files in a directory as queries ,

I have tried:

for i in *.fasta; do ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.xls ; done

It is working on my Mac, however, take 1 whole day to finish a run. I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop. May I know are there any alternative for me?

Thank you.

genome • 3.0k views
ADD COMMENT
0
Entering edit mode

do us a favour and don't call your output files .xls ;-)

how big are the fasta files (size wise, or # entries in it)

ADD REPLY
0
Entering edit mode

The size is from 1-1.5 MB.

ADD REPLY
0
Entering edit mode

I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop.

It is possible that you are exhausting a hardware resource on your Mac (most likely RAM). Have you made sure that you are able to complete one of these jobs with the database you are using before trying to start many in parallel?

ADD REPLY
0
Entering edit mode

Thanks for your comment. Actually as mentioned by jrj.healey, removed the loop and it is working well now.

ADD REPLY
4
Entering edit mode
4.9 years ago
Joe 21k

You are listing your files multiple times, then looping unecessarily before trying to parallel-ly run the command. You're at least duplicating the amount of work needed, and at a glance it looks like it may be even worse than that.

It will be sufficient to do:

ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.tsv

Exactly how long it will take under ideal circumstances is not easy to say ahead of time. The process will run faster with fewer, shorter sequences, but it also depends how quickly a good match can be found (better matches can be returned faster).

ADD COMMENT
0
Entering edit mode

Oh I see, thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6