Running Snakemake on the computing cluster
0
0
Entering edit mode
3.1 years ago
wangdp123 ▴ 340

Hi there,

I have a question about how to run Snakemake tool properly on the HPC (cluster) and I understand that there are at least three ways of doing this below:

1) The following is the content of a test.sh shell script.

#!/bin/bash

#$ -cwd

#$ -V

#$ -l h_rt=48:00:00

#$ -l nodes=10,ppn=1

snakemake -p --cores 10 --snakefile test.snakemake

The way of running is to run "qsub test.sh" to submit this job to the compute nodes of HPC.

2) Use the --cluster argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:

snakemake --cluster qsub -j 32 --snakefile test.snakemake

3) Use the --profile argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:

snakemake --profile myprofile --snakefile test.snakemake

I was wondering if the three methods can lead to the same effects. If not, what would be the difference? Although I was told that the first approach is not going to make use of the multiple nodes of HPC, I don't see that it is true as in practice it seems to have made use of the 10 nodes of HPC.

Many thanks,

Tom

Snakemake HPC • 1.3k views
ADD COMMENT
0
Entering edit mode

I use your approach #1, because I need to load a custom conda environment for snakemake to run. From the documentation, I'm not clear if a job would be launched for each rule or for the entire pipeline when using options 2 and 3 - what is your experience on that?

ADD REPLY

Login before adding your answer.

Traffic: 1899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6