Hi,
We have read counts from several hundred 16S RNA samples for which we would like to run differential testing. The thing is that we have a lot of responses we would like to try. So we need to run the following:
- build a DESeqDataSet with one design formula (few confounders + response)
- get results
- change design and repeat
- assemble results and readjust FDR (using IHW)
As far as I understand DESeq2 the size factor and dispersion estimates should not depend on the actual design formula. So it should be possible to run those calculations only once for all tests. However, we also have many missing data for each response so I would need to subset the count matrix and column data to only those samples that have non-NA entries in the response. Will that conserve the previous estimates for size factors and dispersions?
If not is there a way to achieve that behavior?
Thanks a lot! Chris
Thanks Mike! Makes sense since dispersions depend on the grouping ^_^'. So for now I'm running it building a new DESeqDataSet for each design.
Also, from this preprint:
http://www.biorxiv.org/content/early/2017/06/30/157982
We found that estimateSizeFactors() with type="poscounts" is better than the default size factor estimation when there are many zeros. So just run that before DESeq(), and it won't re-estimate size factors.
(That paper also has new software which improves on the NB methods when there are many zeros.)
Edit: found it, nevermind :)
Great will do. Is there a way to calculate the size factors once for the full count matrix and only subset that matrix for different designs. For instance if I already have a DESeqDataSet with estimated size factors for the full matrix can I get a smaller data set with only a subset of the samples without re-estimating the size factors?
Yes you don't need to re estimate