Why DESeq2 in parallel mode is slower than normal?!
1
0
Entering edit mode
6.9 years ago

Hi everyone, I have 417 samples from 4 groups, each sample contains the expression of 500 genes, (My data is a 500x417 matrix) and I want to do Differential Expression Analysis on it.

When I run DESeq in normal mode (parallel=FALSE), it takes ~137 seconds to finish;

& When I run DESeq in parallel mode (parallel=TRUE), and I register(SnowParam()) with 28 workers using BiocParallel, it takes ~406 seconds to finish;

& When I run DESeq in parallel mode (parallel=TRUE), and I register(MulticoreParam()) with 28 workers using BiocParallel, it takes ~405 seconds to finish.

Why DESeq is slower in parallel mode?

RNA-Seq DESeq Differential Expression parallel • 5.0k views
ADD COMMENT
2
Entering edit mode
6.9 years ago
Michael Love ★ 2.6k

Can you test to see that your parallel setup is ok? For example:

 > register(SerialParam())
 > system.time({ bplapply(1:4, function(i) Sys.sleep(5)) })
    user  system elapsed
   0.016   0.004  20.020
 > register(MulticoreParam(workers=4))
!> system.time({ bplapply(1:4, function(i) Sys.sleep(5)) })
    user  system elapsed
   0.010   0.017   6.203
ADD COMMENT
0
Entering edit mode
register(SerialParam())

system.time({ bplapply(**1:4**, function(i) Sys.sleep(5)) })

user  system elapsed 

0.076   0.060  **20.031** 

----
register(MulticoreParam(workers=**4**))

system.time({ bplapply(**1:4**, function(i) Sys.sleep(5)) })

user  system elapsed 

 0.176   0.552   **9.608** 

----
register(SerialParam())

system.time({ bplapply(**1:28**, function(i) Sys.sleep(5)) })

user  system elapsed 

 0.568   0.352 **140.068** 

----
register(MulticoreParam(workers=**28**))

system.time({ bplapply(**1:28**, function(i) Sys.sleep(5)) })

  user  system elapsed 

0.316   3.784  **17.433** 

----

Not sure, Is it ok?

ADD REPLY
0
Entering edit mode

So the overhead of simply calling 28 workers keeps you away from achieving a speedup of 28, instead you get a speedup of 8 for the toy example of sleeping for five seconds. This might be ameliorated as the task time increases, but with real data you also have to split up the data and send to each worker. I'd try DESeq2 with smaller number of workers, and maybe if you are working with a cluster you can make sure that cores are on the same node. The details of the backend make a difference.

ADD REPLY
0
Entering edit mode

Thanks for you help.

ADD REPLY

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6