Hi there,
I have a question about how to run Snakemake tool properly on the HPC (cluster) and I understand that there are at least three ways of doing this below:
1) The following is the content of a test.sh shell script.
#!/bin/bash
#$ -cwd
#$ -V
#$ -l h_rt=48:00:00
#$ -l nodes=10,ppn=1
snakemake -p --cores 10 --snakefile test.snakemake
The way of running is to run "qsub test.sh
" to submit this job to the compute nodes of HPC.
2) Use the --cluster argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:
snakemake --cluster qsub -j 32 --snakefile test.snakemake
3) Use the --profile argument suggested by the Snakemake manual (https://snakemake.readthedocs.io/en/v5.1.4/executable.html) as below:
snakemake --profile myprofile --snakefile test.snakemake
I was wondering if the three methods can lead to the same effects. If not, what would be the difference? Although I was told that the first approach is not going to make use of the multiple nodes of HPC, I don't see that it is true as in practice it seems to have made use of the 10 nodes of HPC.
Many thanks,
Tom
I use your approach #1, because I need to load a custom conda environment for snakemake to run. From the documentation, I'm not clear if a job would be launched for each rule or for the entire pipeline when using options 2 and 3 - what is your experience on that?