Question: What is the correct way to use GNU parallel with Primer3?
0
Entering edit mode

Hi

I need to design primers for around 40,000 sequences. After doing this task with Primer3, I found that it took a very long time.

I I tried to accelerate primer3 operation with GNU parallel, but I cannot managed to successfully use GNU parallel to split input file and do multi-thread operation. Somehow primer3 still ran on 1 core only.

My command is as the following:

cat fasta.p3in | parallel --round-robin -j 12 --pipe --recend "=" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

Could anyone tell the correct way to use GNU Parallel along with Primer3? Thanks a lot!

ADD COMMENTlink 4.8 years ago lunchboxwu • 20 • updated 4.8 years ago ole.tange ♦ 3.4k
Entering edit mode
1

what would typical command line (without parallel) look like? would it be something like

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta_part1.p3in

as a sidenote, have you looked through the parallel guide? https://www.biostars.org/p/63816/

ADD REPLYlink 4.8 years ago
Ying W
♦ 3.9k
Entering edit mode
0

Hi, Ying W:

Thanks.
I've read through Gnu Parallel tutorial and the post https://www.biostars.org/p/63816/..
The command I used is according to the BLAT example in the biostar post.

The command line (without parallel) of primer3 is:

/Tools/primer3/primer3-2.3.6/src/primer3_core fasta.p3in > fasta.p3out

and the record in *.p3in (primer3 input format) is:

SEQUENCE_ID=1
SEQUENCE_TEMPLATE=ATATGGCGATAGTAAAATTTTGAAAAAAAAAAAGAAAAATTTTAGAAGCAAAATTTTCCGTCATCTTGAATTTTGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=
SEQUENCE_ID=2
SEQUENCE_TEMPLATE=TTAAATTTAACACAAAACTTTTTACCGTGTGGGAAAATTTCTAATAAACAGGATTTATCAGATTTATCAATTGCAAGAAAA
PRIMER_PRODUCT_SIZE_RANGE=100-280
SEQUENCE_TARGET=20,17
PRIMER_MAX_END_STABILITY=250
=

there's a '=' at the end of each record

any ideas?

ADD REPLYlink 4.8 years ago
lunchboxwu
• 20
4
Entering edit mode

Your biggest mistake was probably that your records contain '=' on every line, but only '\n=\n' is a record separator. Using the command 'wc' or '--files cat' is great for debugging that kind of problems.

Your second mistake is that --block-size defaults to 1M: So the first instance may simply gobble up everything.

This ought to work (untested, as I have neither access to fasta.p3in nor to primer3):

cat fasta.p3in | parallel -N1 --round-robin --pipe --recend "\n=\n" --cat /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out

You can possibly leave out --cat if primer3 reads from STDIN. If GNU Parallel takes up significant time, increase -N1: With 40000 records it is probably OK to split on bigger chunks than 1 record.

ADD COMMENTlink 4.8 years ago ole.tange ♦ 3.4k
Entering edit mode
2

Thank you for your help, ole.tange. You are my lifesaver!

You're right, I should use "\n=\n" as delimiter and I also should set record number for parallel.

Finally I managed to run primer3 with parallel. The command line is the following:

cat fasta.p3in | parallel -N10 --round-robin --pipe --recend "\n=\n" /Tools/primer3/primer3-2.3.6/src/primer3_core > fasta.p3out
ADD REPLYlink 4.8 years ago
lunchboxwu
• 20

Login before adding your answer.

Powered by the version 1.8