Question

[DESeq2 multiple treatments vs. 1-by-1] Inconsistant p-adj

0

Entering edit mode

9.7 years ago

madkitty ▴ 690

So we have RNAseq data for 3 treatments (A, B and C) that we compared to control. Each treatment should be compared to control (as in A vs. CT, B vs. CT, C vs. CT) and from then we should know the number of differentially expressed genes (padj < 0.1) and which genes are upregulated and downregulated.

When I run the DESeq2 pipeline with a table containing Control (CT) and the 3 treatments, we found about 1,000 differentially expressed genes (padj < 0.1) but the result CSV spreadsheet only had one log2fold column and padj column, where I was expecting to have 3 column log2fold and padj for each comparison (A vs. CT, B vs. CT and C vs. CT). Since I couldn't extract the log2fold and padj for each comparison, I re-run the DESeq2 pipeline on each treatment vs control separately, and now the number of differentially expressed genes is completely different, I obtained:

A vs CT : 1000 genes
B vs CT: 2000 genes
C vs CT : 2500 genes

In total that's far beyond the original 1,000 genes I had when running DESeq2 with the 3 treatments vs Control in one spreadsheet.

What causes this difference?
And now I'm wondering which pipeline is the right one?
Should I run it independently or all treatments together on one spreadsheet?
If so, how can I extract the padj and log2 fold changes for each treatment if they are run together??

RNA-Seq deseq2 • 4.1k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by madkitty ▴ 690

2

Entering edit mode

You'll need to show the code you used in the first vs. one of the second cases for us to help. My guess is that in the first instance you ended up comparing the full model against something like ~1, which isn't what you want. It's completely possible to extract fold-changes and adjusted p-values while keeping everything in. In fact, you'll get more reliable results that way, due to better variance estimation.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-08-25

3

Entering edit mode

9.7 years ago

Michael Love ★ 2.6k

You want to use the "contrast" argument to the results() function, in order to build a results table for your three desired contrasts. You can run DESeq() on the dataset containing all the samples, which can improve variance estimation as Devon mentioned.

First read over the section on contrasts in the DESeq2 vignette and the help for the "contrast" argument in ?results.

Also, as Devon suggested, it helps to provide your code to get more precise answers to your questions.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Michael Love ★ 2.6k

0

Entering edit mode

I too have the same of similar doubt ,so im comparing multiple cell types each having its own control sample based on cell hierarchy, for normalisation purpose i use all the sample for pca ,clustering ,correlation etc , but when it comes to doing a differential expression i have to do each of them separately ,like Stem cell vs progenitor , Common myeloid progenitor vs Granulocyte monocyte progenitor[GMP] , then GMP vs Monocyte .

Its a conceptual doubt since my control is not always the same as in one case its stem cell and other one is progenitor cell .So how can i make multiple in this case as each comparison will have different foldchange and the calculated p value ,unless i have my control same for every test.

I have used the contrast when i was making comparison stem cell with everything downstream ,but how to do when controls are different

ADD REPLY • link 5.2 years ago by 1769mkc ★ 1.2k