Question

Efficient way to run a DESeq2 analysis with many design formulae

0

Entering edit mode

6.6 years ago

cdiener • 0

Hi,

We have read counts from several hundred 16S RNA samples for which we would like to run differential testing. The thing is that we have a lot of responses we would like to try. So we need to run the following:

build a DESeqDataSet with one design formula (few confounders + response)
get results
change design and repeat
assemble results and readjust FDR (using IHW)

As far as I understand DESeq2 the size factor and dispersion estimates should not depend on the actual design formula. So it should be possible to run those calculations only once for all tests. However, we also have many missing data for each response so I would need to subset the count matrix and column data to only those samples that have non-NA entries in the response. Will that conserve the previous estimates for size factors and dispersions?

If not is there a way to achieve that behavior?

Thanks a lot! Chris

microbiome RNA-Seq DESeq2 • 2.8k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.6 years ago by cdiener • 0

score 3 · Answer 1 · 2017-09-07

3

Entering edit mode

6.6 years ago

Michael Love ★ 2.6k

hi cdiener!

Dispersion estimation does depend on the design, but size factors do not.

ADD COMMENT • link 6.6 years ago by Michael Love ★ 2.6k

0

Entering edit mode

Thanks Mike! Makes sense since dispersions depend on the grouping ^_^'. So for now I'm running it building a new DESeqDataSet for each design.

ADD REPLY • link 6.6 years ago by cdiener • 0

1

Entering edit mode

Also, from this preprint:

http://www.biorxiv.org/content/early/2017/06/30/157982

We found that estimateSizeFactors() with type="poscounts" is better than the default size factor estimation when there are many zeros. So just run that before DESeq(), and it won't re-estimate size factors.

(That paper also has new software which improves on the NB methods when there are many zeros.)

ADD REPLY • link 6.6 years ago by Michael Love ★ 2.6k

0

Entering edit mode

Edit: found it, nevermind :)

Great will do. Is there a way to calculate the size factors once for the full count matrix and only subset that matrix for different designs. For instance if I already have a DESeqDataSet with estimated size factors for the full matrix can I get a smaller data set with only a subset of the samples without re-estimating the size factors?