Question

STAR genomeLoad overlap conflict

0

Entering edit mode

6.9 years ago

cakesebastian • 0

Hello I am studying several SRRs and submit a Job for each SRP. In my Jobs I use genomeLoad LoadAndExit, then do the mapping with the genomeLoad LoadAndKeep and end the script with genomeLoad Remove. As I have a few Jobs running at once I sometimes get the following Errors:

"Another job is still loading the genome, sleeping for 1 min"

followed by

"EXITING because of FATAL ERROR: waited too long for the other job to finish loading the genomeSuccess SOLUTION: remove the shared memory chunk by running STAR with --genomeLoad Remove, and restart STAR May 30 22:14:26 ...... FATAL ERROR, exiting"

Does anyone know a solution how i can have several Jobs running at once and still use the genomeLoad option of STAR?

STAR genomeLoad RNA-Seq cluster • 4.5k views

ADD COMMENT • link updated 6.9 years ago by h.mon 35k • written 6.9 years ago by cakesebastian • 0

0

Entering edit mode

How big are the nodes you're running on and how many threads are your jobs using? The general strategy is to break things up by node with multiple samples per node and a single genome load/unload on each.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

I submit every Job (one SRP) to one node. Every node has 64GB RAM and 28 cores. I am running the mapping with --runThreadN 28 and I am mapping on the human genome.

ADD REPLY • link 6.9 years ago by cakesebastian • 0

0

Entering edit mode

That error should only happen if two jobs are loading the genome on the same node or if your nodes have memory shared between them.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

Do you know how i could check if the memory is shared between the nodes? At the moment I indeed use parallel -j1 but i think that should not overlap ?

ADD REPLY • link 6.9 years ago by cakesebastian • 0

1

Entering edit mode

Ask the cluster admin.

ADD REPLY • link 6.9 years ago by Devon Ryan 104k

0

Entering edit mode

The cores do share their memory but not the nodes

ADD REPLY • link 6.9 years ago by cakesebastian • 0

1

Entering edit mode

If you are using a cluster with a real job scheduler then you should not be using parallel (if that is gnu parallel program).

ADD REPLY • link 6.9 years ago by GenoMax 141k

0

Entering edit mode

Seen from the error description, I would say that you can't do this in parallel. You'll need more memory to do this in parallel, or wait until the previous job is ended.

ADD REPLY • link 6.9 years ago by Benn 8.3k

score 4 · Answer 1 · 2017-05-31

My guess is that you are somehow starting the STAR job before genomeLoad LoadAndExit has finished loading the genome.

Try with these: 1) Don't use genomeLoad LoadAndExit, as this is not necessary if you use genomeLoad LoadAndKeep later 2) use only genomeLoad LoadAndKeep 3) Run genomeLoad Remove AFTER you have finished all jobs. Anyway, it will remove the genome from shared memory only after all STAR jobs accessing it are finished.

score 0 · Answer 2 · 2017-05-31

You should ask your cluster administrator or read the documentation (big clusters commonly have a web page with lots of documentation) to learn the details about the cluster you use. But in general, queue managers are configured to see shared memory nodes as one node, and non-shared memory nodes as distinct nodes.

Assuming non-shared nodes, you would need to:

1) submit a job with genomeLoad LoadAndKeep to every node you plan to use,

2) after that submit your mapping runs specifying exactly the nodes with the index loaded, with --genomeDir but without --genomeLoad

3) and then submit a job with genomeLoad Remove again to every node which has a genome loaded.

You can (should, probably) use job dependencies to ensure the correct order of execution of these scripts. You may need different settings on your genomeLoad Remove scripts, to ensure they will be executed as soon as the mapping finishes, and not stay queued while other jobs run, as this would eat up a lot of memory from the nodes.