How to run multiple jobs with GNU parallel with multiple arguments?
1
1
Entering edit mode
8.1 years ago

Hi I have a burning question,

I want to run a script named "predict_binding.py". Its syntax is:

./predict_binding.py [argA] [argB] [argC] ./file.txt

file.txt has a column of strings with the same length:

string_1 
string_2 
string_3
...
string_n

predict_binding.py works with the first 3 arguments and string_1, then the 3 arguments and string_2, and so on.

That's fine, but now I have m argB, and I want to test all of them. I want to use the cluster for this, and this looks like a perfect job for parallel, isn't it?

After reading the manual and spending hours to try to make it work I realised I need some help.

What works so far (and is trivial) is:

parallel --verbose ./predict_binding ::: argA ::: argBi ::: argC ::: ./file.txt

This gives the same result as:

./predict_binding.py argA argBi argC ./file.txt

And indeed the flag --verbose says that the command looks like

./predict_binidng.py argA argBi argC ./file.txt

but I want to test all arg2, so I made a file called args.txt, which looks like this:

argA argB1 argC ./file.txt
argA argB2 argC ./file.txt
...
argA argBm argC ./file.txt

If I do:

cat args.txt | parallel --verbose ./predict_binding.py {}

I get an error from ./predict_binding saying:

predict_binding.py: error: incorrect number of arguments

And verbose says that the command looks like: ./predict_binding.py argA\ argBi\ argC\ ./file.txt

So, maybe those backslashes are affecting the input of ./predict_binding? How could I avoid them?

I have tried using double and single quotations " ', backslash \, backslash with single quote \', none has work!

I also tried:

cat ./args.txt | parallel --verbose echo | ./predict_binding

Same error as above.

And also I tried to use a function like:

binding_func ( ) { ./predict_binding argA $1 argC ./file.txt}

Interestingly, binding_func works for:

parallel binding_func ::: argB1

But if I do:

parallel binding_func ::: argB1 argB2

It gives the result for one arg but fails (same error as above) for the other.

If I put only argB1 in the args.txt file and do:

cat args.txt | parallel --verbose binding_func {}

It fails miserably with the same error: predict_binding.py: error: incorrect number of arguments

It seems a very trivial and easy problem but I haven't been able to solve it }:(

I would appreciate very much any help provided. :)

GNU parallel • 20k views
ADD COMMENT
0
Entering edit mode

It is unclear whether you have spent an hour walking through the tutorial (man parallel_tutorial or www.gnu.org/software/parallel/parallel_tutorial.html ).

Can you clear that up?

ADD REPLY
2
Entering edit mode
8.1 years ago
malteherold ▴ 60

Final Edit: Calling it like this will work:

cat vals.txt | parallel --verbose "echo -e mhc_i/examples/input_sequence.fasta | mhc_i/src/predict_binding.py smm {} 9"

parallel --verbose "echo -e mhc_i/examples/input_sequence.fasta | mhc_i/src/predict_binding.py {}" ::: smm ::: HLA-C*15:02 HLA-E*01:01 ::: 9

The script predict_binding.py looks also for sys.stdin (l301-303) and adds an empty argument to the list if you call it with parallel in the way you tried. I'm not sure why this happens, but if you pipe the input fasta file to the script you can also run it with parallel.


Old Answer:

I am not sure if I understand the question correctly but for me this seems to work if the 2nd argument is a list of arguments:

parallel --verbose ./predict_binding ::: argA ::: argB1 argB2 argB3 ::: argC ::: ./file.txt

parallel --verbose ./predict_binding ::: argA ::: argB* ::: argC ::: ./file.txt (e.g. if arguments are files)

parallel --verbose ./script.sh ::: argA ::: argB* ::: argC ::: file.txt

./script.sh argA argB argC file.txt
./script.sh argA argB1 argC file.txt
./script.sh argA argB2 argC file.txt
ADD COMMENT
0
Entering edit mode

Thanks for your reply. I've just tried it and it didn't work. :(

I created file2.txt with the following in it:

argB1
argB2

As I understand your suggestion, I did:

parallel --verbose ./script.sh ::: argA ::: file2.txt ::: argC ::: file1.txt

The command was passed as:

./script.sh argA file2.txt argC file1.txt

And it should be:

./script argA argB1 argC file1.txt

What I am trying to do is to run many jobs each with:

./script argA argBi argC file.txt

And if possible sending the output to a file with the name of argBi, something like:

./script argA argBi argC file.txt > argBi.txt
ADD REPLY
0
Entering edit mode

The above answer only works if argB1, argB2, argB3... are files in the same directory.

I guess the problem is reading multiple arguments from a file? I also wouldn't know how to do this. If you do this: cat args.txt | parallel --verbose ./predict_binding.py {} each line is used as argument.

If you have a file called vals.txt with only the values for argB this should work:

parallel --verbose ./script.sh ::: argA ::: `cat vals.txt` ::: argC ::: file.txt

or

cat vals.txt | parallel ./script.sh argA {} argC file.txt

for writing output then:

cat vals.txt | parallel "./script.sh argA {} argC file.txt > {}.output"
ADD REPLY
0
Entering edit mode

If I do:

parallel --verbose ./script.sh ::: argA ::: `cat vals.txt` ::: argC ::: ./file.txt

It works only for one job and the other I get the error: .script.sh: error: incorrect number of arguments

And if I use the same vals.txt file and do:

cat vals.txt | parallel --verbose ./script.sh argA {} argC ./file.txt

I get the same error as above for the two jobs. The same happened for the command with the output.

Also if I do:

parallel --verbose ./script.sh ::: argA ::: argB1 argB2 ::: argC ::: ./file.txt

It works for argB1 but for argB2 I get the error: .script.sh: error: incorrect number of arguments

even if argB1 == argB2...

ADD REPLY
0
Entering edit mode

could you post the script you used for testing? or what does the --verbose output say in this case? I tried all of your examples above and they all work for me.

ADD REPLY
0
Entering edit mode

The --verbose output is:

./script.sh argA argB1 argC ./file.txt
./script.sh argA argB2 argC ./file.txt

I don't understand why it works only for the first one, while when it comes the second line ./script.sh gives the error predict_binding.py: error: incorrect number of arguments

I am using the actual script I want to use, and it might be too long to post it here. The tar.gz file can be downloaded from here. Although if you want I can copy and paste the script.

I've just tried the following for loop alone, and it works beautifully:

for argBi in `cat vals.txt`
do
./script.sh argA $argBi argC ./file.txt > $argBi.txt
done

When I try to use parallel by doing:

parallel --verbose ./script.sh argA {} argC ./file.txt > {}.txt ::: `cat vals.txt`

It fails! And the verbose output says that the command ran was:

./script.sh argA argB1 argC ./file.txt
./script.sh argA argB2 argC ./file.txt

I should say that the argB strings look like: HLA-A*01:01

And the output of --verbose shows them like HLA-A*01:01 But I don't think that's the problem because if I do:

parallel --verbose ./script.sh ::: argA ::: HLA-A*01:01 ::: argC ::: ./file.txt

It works and the --verbose output says:

./script.sh argA HLA-A\*01:01 argC ./file.txt
ADD REPLY
0
Entering edit mode

Actually this is quite weird. It has something to do with the python script, but I didn't figure it out yet. So if I run:

parallel --verbose mhc_i/src/predict_binding.py {} ::: smm ::: HLA-E*01:03 HLA-E*01:01 HLA-C*15:02 ::: 9 ::: mhc_i/examples/input_sequence.fasta

Verbose Output:

mhc_i/src/predict_binding.py smm HLA-E\*01:03 9 mhc_i/examples/input_sequence.fasta
mhc_i/src/predict_binding.py smm HLA-E\*01:01 9 mhc_i/examples/input_sequence.fasta
mhc_i/src/predict_binding.py smm HLA-C\*15:02 9 mhc_i/examples/input_sequence.fasta

The first one works while the others fail with predict_binding.py: error: incorrect number of arguments

However if I copy the commands of the verbose output and run them they all work.

If the command is run with parallel it somehow has 5 arguments instead of 4: ['smm', 'HLA-E*01:03', '9', 'mhc_i/examples/input_sequence.fasta', '']

This has something to do with lines 301-303:

    if sys.stdin in select.select([sys.stdin], [], [], 0)[0]:
        infile = sys.stdin.readline().strip()
        args.append(infile)

If you comment this out in the script the parallel call above will work.

Or you dont comment this out and run the script like this:

parallel --verbose "echo -e mhc_i/examples/input_sequence.fasta | mhc_i/src/predict_binding.py {}" ::: smm ::: HLA-C*15:02 HLA-E*01:01 ::: 9
ADD REPLY

Login before adding your answer.

Traffic: 2596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6