Question

How to call a R variable in a loop with Slurm?

0

Entering edit mode

4.9 years ago

pablo ▴ 300

Hello,

I have a R (RHO_COR.R) script and I would like to create a loop in order to split the jobs on several nodes.

I show the part of the script where I would like to create the loop.

res <- foreach(i = seq_len(nrow(combs)) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

The different results of res (which correspond to submatrices of correlation between OTUs) are stored in several files. combs is a vector which looks like this (but it can change, according to my input file) :

> combs
      [,1] [,2]
 [1,]    1    2
 [2,]    1    3
 [3,]    1    4
 [4,]    1    5
 [5,]    2    3
 [6,]    2    4
 [7,]    2    5
 [8,]    3    4
 [9,]    3    5
[10,]    4    5

I would like to send each row of combs seq_len(nrow(combs) on a node.

This is my slurm code :

#!/bin/bash
#SBATCH -o job-%A_task.out
#SBATCH --job-name=paral_cor
#SBATCH --partition=normal
#SBATCH --time=1-00:00:00
#SBATCH --mem=126G  
#SBATCH --cpus-per-task=32

#Set up whatever package we need to run with

module load gcc/8.1.0 openblas/0.3.3 R

# SET UP DIRECTORIES

OUTPUT="$HOME"/$(date +"%Y%m%d")_parallel_nodes_test
mkdir -p "$OUTPUT"

export FILENAME=~/RHO_COR.R

#Run the program

Rscript $FILENAME > "$OUTPUT"

I do not want to use arrays. I wonder if I create an argument which is seq_len(nrow(combs) could be a solution ?

for i in my_argument
 do Rscript $FILENAME -i > "$OUTPUT"
done

Thanks

(I asked on stackoverflow but I didn't get any answer back yet..)

r slurm matrix • 2.7k views

ADD COMMENT • link 4.9 years ago by pablo ▴ 300

0

Entering edit mode

You'll need an srun in there and $i rather than -i.

ADD REPLY • link 4.9 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for your reply. But can I combine srun and Rscript into the same loop? And other point, I don't know how to "call" my R variable as an argument into this loop.

ADD REPLY • link 4.9 years ago by pablo ▴ 300

0

Entering edit mode

Edit : I saved my variable into a file that I read in bash.

And I use :

res <- foreach(i = opt$subset) %dopar% {
 G1 <- split[[combs[i,1]]]
 G2 <- split[[combs[i,2]]]
 dat.i <- cbind(data[,G1], data[,G2])
 rho.i <- cor_rho(dat.i)
}

Slurm part

var=$(cat ~/my_file.tsv | wc -l)
subset=$(seq $var)

I still struggle to find a way to execute the jobs on several nodes. The loop is executed on only one node and I don't find an issue with srun...

ADD REPLY • link 4.9 years ago by pablo ▴ 300

0

Entering edit mode

If you're going to use %dopar% then run it in parallel directly in R and don't bother submitting multiple jobs. You'll have to figure out how to do that on your local cluster of course. Otherwise just use an array job or create a loop in your sbatch script calling srun for each value of i.

ADD REPLY • link 4.9 years ago by Devon Ryan 104k

0

Entering edit mode

That's what I would like to do : calling srun for each value of i .

I tried :

for i in $subset
do
srun Rscript my_script.R --subset $i 
done

But it is still executed on one node..

ADD REPLY • link 4.9 years ago by pablo ▴ 300