Parallel for shell script with different output
2
1
Entering edit mode
9.4 years ago
Korsocius ▴ 250

Dear all,

I need help with parallel command. I have one script in shell. And I would like to run it 12time in same time. But for each script I need different name of output. output is .tsv and the name is same like name of input, could you help how to do that?

Thanks a lot

shell parallel • 4.7k views
ADD COMMENT
2
Entering edit mode

please post your standard shell command line;

the parallel command would be somthing like

parallel scriptname <hardcoded options for the script> {1}.input {1}.output ::: prefixes

ADD REPLY
0
Entering edit mode

Show some example of what is input and output.

ADD REPLY
0
Entering edit mode

I have script with name bin.sh where is input 1.bam and output will be 1.tsv. Every input is in the different folder with same name from 1-12. In this folder are 1.bam (imput), 1.bai (input) => they are reading with shell script. Output will 1.tsv and etc. for each bam file.

ADD REPLY
0
Entering edit mode

parallel myscript {} {.}.bai '>' {.}.tsv ::: */*.bam

ADD REPLY
6
Entering edit mode
9.4 years ago

an example with echo:

 seq 1 12  | parallel  echo "Hello" '>' 'result.{}'
ADD COMMENT
0
Entering edit mode

Also look into --results for a structured way of organizing the output files.

ADD REPLY
0
Entering edit mode
9.4 years ago
for input in *.bam; do out=`echo $input | awk -F"." '{ print $1}'; bin.sh $input $out.tsv & done

To understand:

for input in *.bam;  #for each bam file
do
out=`echo $input | awk -F"." '{ print $1}'` #get the uniq output prefix
bin.sh $input $out.tsv & #run the srcipt and push it to background
done
ADD COMMENT
2
Entering edit mode

won't work if there are not enough cores.

ADD REPLY
0
Entering edit mode

Yes. But the for loop helps me a lot in other cases where the operation is computationally not expensive.

ADD REPLY
2
Entering edit mode

... and if something wrong happens, you'll have to (quickly) find & kill your PIDs ...

ADD REPLY
0
Entering edit mode

I used to do that.

ADD REPLY
1
Entering edit mode

I am trying to understand why people still use for loops for independent jobs.

Is it readability? Is the for-loop really easier to read than 'parallel bin.sh {} {.}.tsv ::: *.bam'? Or if the jobs were bigger/more complex using a function:

myfunc() {
  bin.sh "$1" "$2"
  #more stuff here
}
export -f myfunc
parallel myfunc {} {.}.tsv ::: *.bam

For computationally cheap jobs I really do not see the benefit of a for loop.

The only advantage I can think of is that GNU Parallel may not be installed. But that advantage can vanish in just 10 seconds: wget -O - pi.dk/3|bash

@Geek_y can you enlighten me, what you see as the advantage?

ADD REPLY
0
Entering edit mode

I am used to use for loop. But definitely will shift towards parallel. Started reading your tutorial on parallel. I am from biology background and a kind of beginner in core bioinfo, hence, need some time to learn best practices.

ADD REPLY
0
Entering edit mode

I wonder why one would dispatch computationally cheap operations to a bg core in the first place. The gain in execution time will surely be balanced out by the time taken to write the loop + dispatch to different cores.

ADD REPLY
1
Entering edit mode

Input files are in different folders, you might wanna use a find (with an optional maxdepth) to find these files first, then run the script on them.

ADD REPLY

Login before adding your answer.

Traffic: 2254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6