Question

Running Snakemake on the computing cluster

0

Entering edit mode

3.1 years ago

wangdp123 ▴ 340

Hi there,

I have a question about how to run Snakemake tool properly on the HPC (cluster) and I understand that there are at least three ways of doing this below:

1) The following is the content of a test.sh shell script.

#!/bin/bash

#$ -cwd

#$ -V

#$ -l h_rt=48:00:00

#$ -l nodes=10,ppn=1

snakemake -p --cores 10 --snakefile test.snakemake

The way of running is to run "qsub test.sh" to submit this job to the compute nodes of HPC.

2) Use the --cluster argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:

snakemake --cluster qsub -j 32 --snakefile test.snakemake

3) Use the --profile argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:

snakemake --profile myprofile --snakefile test.snakemake

I was wondering if the three methods can lead to the same effects. If not, what would be the difference? Although I was told that the first approach is not going to make use of the multiple nodes of HPC, I don't see that it is true as in practice it seems to have made use of the 10 nodes of HPC.

Many thanks,

Tom

Snakemake HPC • 1.3k views

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 3.1 years ago by wangdp123 ▴ 340

0

Entering edit mode

I use your approach #1, because I need to load a custom conda environment for snakemake to run. From the documentation, I'm not clear if a job would be launched for each rule or for the entire pipeline when using options 2 and 3 - what is your experience on that?

ADD REPLY • link 3.0 years ago by Ram 43k