Biostar Beta. Not for public use.
Question: Parallel for shell script with different output
1
Entering edit mode

Dear all,

I need help with parallel command. I have one script in shell. And I would like to run it 12time in same time. But for each script I need different name of output. output is .tsv and the name is same like name of input, could you help how to do that?

Thanks a lot

ADD COMMENTlink 5.3 years ago Korsocius • 110 • updated 5.3 years ago geek_y 9.7k
Entering edit mode
Entering edit mode
2

please post your standard shell command line;

the parallel command would be somthing like

parallel scriptname {1}.input {1}.output ::: prefixes

ADD REPLYlink 5.3 years ago
russhh
♦ 4.4k
Entering edit mode
0

Show some example of what is input and output.

ADD REPLYlink 5.3 years ago
geek_y
9.7k
Entering edit mode
0

I have script with name bin.sh where is input 1.bam and output will be 1.tsv. Every input is in the different folder with same name from 1-12. In this folder are 1.bam (imput), 1.bai (input) => they are reading with shell script. Output will 1.tsv and etc. for each bam file.

ADD REPLYlink 5.3 years ago
Korsocius
• 110
Entering edit mode
0

parallel myscript {} {.}.bai '>' {.}.tsv ::: /.bam

ADD REPLYlink 5.3 years ago
ole.tange
♦ 3.4k
6
Entering edit mode

an example with echo:

 seq 1 12  | parallel  echo "Hello" '>' 'result.{}'
ADD COMMENTlink 5.3 years ago Pierre Lindenbaum 120k
Entering edit mode
0

Also look into --results for a structured way of organizing the output files.

ADD REPLYlink 5.3 years ago
ole.tange
♦ 3.4k
0
Entering edit mode
for input in *.bam; do out=`echo $input | awk -F"." '{ print $1}'; bin.sh $input $out.tsv & done

To understand:

for input in *.bam;  #for each bam file
do
out=`echo $input | awk -F"." '{ print $1}'` #get the uniq output prefix
bin.sh $input $out.tsv & #run the srcipt and push it to background
done
ADD COMMENTlink 5.3 years ago geek_y 9.7k • updated 5.3 years ago RamRS 21k
Entering edit mode
2

won't work if there are not enough cores.

ADD REPLYlink 5.3 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

Yes. But the for loop helps me a lot in other cases where the operation is computationally not expensive.

ADD REPLYlink 5.3 years ago
geek_y
9.7k
Entering edit mode
2

... and if something wrong happens, you'll have to (quickly) find & kill your PIDs ...

ADD REPLYlink 5.3 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

I used to do that.

ADD REPLYlink 5.3 years ago
geek_y
9.7k
Entering edit mode
1

I am trying to understand why people still use for loops for independent jobs.

Is it readability? Is the for-loop really easier to read than 'parallel bin.sh {} {.}.tsv ::: *.bam'? Or if the jobs were bigger/more complex using a function:

myfunc() {
  bin.sh "$1" "$2"
  #more stuff here
}
export -f myfunc
parallel myfunc {} {.}.tsv ::: *.bam

For computationally cheap jobs I really do not see the benefit of a for loop.

The only advantage I can think of is that GNU Parallel may not be installed. But that advantage can vanish in just 10 seconds: wget -O - pi.dk/3|bash

@Geek_y can you enlighten me, what you see as the advantage?

ADD REPLYlink 5.3 years ago
ole.tange
♦ 3.4k
Entering edit mode
0

I am used to use for loop. But definitely will shift towards parallel. Started reading your tutorial on parallel. I am from biology background and a kind of beginner in core bioinfo, hence, need some time to learn best practices.

ADD REPLYlink 5.3 years ago
geek_y
9.7k
Entering edit mode
0

I wonder why one would dispatch computationally cheap operations to a bg core in the first place. The gain in execution time will surely be balanced out by the time taken to write the loop + dispatch to different cores.

ADD REPLYlink 5.3 years ago
RamRS
21k
Entering edit mode
1

Input files are in different folders, you might wanna use a find (with an optional maxdepth) to find these files first, then run the script on them.

ADD REPLYlink 5.3 years ago
RamRS
21k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0