Question

Understanding Resource Job Allocation

1

Entering edit mode

3.3 years ago

timothy.delory ▴ 20

I am running a snakemake workflow, and I am confused on how my memory and thread requests translate to resource allocation on my slurm partition. In my rule I ask for 8 threads, and 10 gigs of memory, and am provided 16 cores per job. However, I have set the --cores flag set to 46 on .sh that runs my snakefile, yet five of these jobs are running at once, which would give me the impression that my --core 46 command is being exceeded. Also, I'm interested to know the relationship between memory allocated per job and additional cores allocated. For example, I had 10 gigs of memory for each job, but was getting an oom kill error, before specifying a thread increase to 8. Before I went up to 8 threads, my snakejobs were only being allocated 2 cores. So by just adjusting the threads up to 8, but keeping memory the same, I went from two cores to 16 cores allocated per job. Why would that be? Are additional cores used up as memory? Also, does this mean that if all 5 jobs are running with the 16 cores provided per job, that my --cores 46 command in my .sh was superceeded somehow? I was having trouble working this out from the snakemake website. Any help would be greatly appreciated!

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 16 Rules claiming more threads will be scaled down. Job counts: count jobs 1 index_genome 1

[Tue Feb 2 10:15:51 2021] rule index_genome: input: /mypath/genomic.fna output: /mypath/genomic.fna.fa.ann jobid: 0 wildcards: bwa_extension=.fa.ann threads: 8 resources: mem_mb=10000

[bwa_index] Pack FASTA... 1.96 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 172.44 seconds elapse. [bwa_index] Update BWT... 1.23 sec [bwa_index] Pack forward-only FASTA... 0.93 sec [bwa_index] Construct SA from BWT and Occ...

snakemake slurm memory workflow • 1.5k views

ADD COMMENT • link updated 3.3 years ago by i.sudbery 19k • written 3.3 years ago by timothy.delory ▴ 20

score 2 · Answer 1 · 2021-02-02

I'm a little hazy on some of the details here as I have more experience with SGE than SLURM, and workflow managers other than snakemake, but it works the same way on the others so....

The first thing to appreciate is that when you ask for 10GB, I think you are asking to be allocated 10GB _per core_ (thats how it normally works). So if you ask for 4 cores for a job, that will be 40GB. If you then run 5 of those 4 core jobs, you'd be asking for 200GB over all (although this might be split between 5 different execution nodes on the cluster).

Secondly, I think that --cores tells snakemake how many local cores to uses. This is useful when snakemake is executing all the jobs locally and not submitting them to a job management system like SLURM, but when using SLURM, its unlikely that you need more than 2 cores to run snakemake itself (not absolutely certain of this, because I don't know the internal structure of snakemake). If you want to limit how much your snakemake master process submits to the cluster, then use the -j parameter, which control how many concurrent jobs to run on the cluster (although be warned that each job might use multiple cores, as specified by its resource requirements).