Question

deseq2 foldchange issue

1

Entering edit mode

5.7 years ago

1769mkc ★ 1.2k

I have gone through the deseq2 paper where it tells that its not a simple logfold change of numerator by denominator ,its shrunken LFC which it calculates the maths behind it kind of im trying to understand in simple manner or layman term as i have to give some explanation why rna seq and RTqpcr are not similar...

I have 4 control and 4 test sample respectively all the counts are normalized used deseq2 default

gene baseMean log2FoldChange lfcSE stat pvalue padj nH4 nH3 nH2 nH1 C4  C3  C2  C1
ENSG00000112773.13  1156.5575015292 2.52671958320643    0.330914786026763   7.63555963619613    2.25E-14    5.31E-12    326.344363070055    316.430980291062    369.993520743799    241.523773415627    1626.54796885324    3322.42939303791    1321.51219870859    1727.67781411328

so how come the log2foldchange or logfoldchange is low.?even though there is a big difference between my sample labelled as H vs C which are stem cells and C's are progenitor cells ?

Another data point ,i have used lfcMLE for logfold change which says "“unshrunken” log2 fold changes (for a simple comparison, the ratio of the mean normalized counts in the two groups"

 gene baseMean log2FoldChange lfcMLE lfcSE stat pvalue padj SO_7660_BR_02 SO_7660_BR_03 SO_7660_BR_04 SO_7660_BR_06

797.414 2.972936 3.976479 0.1360428 21.85294 7.288395e-106 1.204553e-101  113.6162 75.41853 1320.512 1680.109

log2foldchange = 2.972936
lfcMLE   = 3.976479

there is difference log2foldchange and lfcMLE somehow its still less as compared to qPCR ...

rna-seq • 3.2k views

ADD COMMENT • link 5.7 years ago by 1769mkc ★ 1.2k

score 3 · Accepted Answer · 2018-08-21

3

Entering edit mode

5.7 years ago

Devon Ryan 104k

The fold-change is barely shrunken for these samples, the naive fold-change is ~6.4 and the shrunken fold-change is ~5.8. So I don't know why you think the log2FC is particularly low. The thing with the naive fold-change is that it will be innately unreliable due to the low number of samples and high variability, which is why it's shrunken toward 0 with a normally distributed prior.

The general idea behind shrinking p-values is that fold-changes will generally be overestimated given the actual variance in the samples and the typical miniscule number of samples. So lowly expressed genes are strongly pulled toward 0 while more highly expressed genes are less strongly pulled toward 0.

Regarding a discordance with your qPCR data, without at least knowing what the fold-change was it's impossible for us to judge how actually different the two values are. In general there's good (though not perfect) concordance between the two technologies, with qPCR generally being less sensitive/more variable.

ADD COMMENT • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

"discordance with your qPCR data, without at least knowing what the fold-change was it's impossible for us to judge how actually different the two values are" with respect to this as the qPCR shows around 50 fold which is actually not done by me as part of data im handling so it seems there is a bit big "discordance" , may be you can suggest something ,

and is it like in the in vivo data there tend to be less drastic changes as compared to cell line data..as in cell line data it looks very much close to qPCR data but not with the mouse data

ADD REPLY • link 5.7 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

Cell lines are going to be pretty homogenous, actual animals less so. So especially if you used a bunch of littermates or something like that for the RNA-seq then the fold-change would be artificially lower there. Having said that, a 50x change is pretty much "absurdly high" unless a gene is lowly expressed.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

i would post my code that might help if there is something im doing wrong i will update it...if at all in the code there is an issue

@Devon this is what i use for my deseq2 analysis do have a look and if there is any issue with the code it would be of great help for me to know

countdata <- read.table('mouse_bone_marrow.txt', header=TRUE, row.names=1)
head(countdata)
countdata <- countdata[ ,6:ncol(countdata)]
head(countdata)
dim(countdata)

countdata <- countdata[rowSums(countdata)>10,]


dim(countdata)
head(countdata)
colnames(countdata) <-  colnames(countdata)
names(countdata)
length(countdata[,1])
countdata <- as.matrix(countdata)
head(countdata)
colnames(countdata)


condition <- factor(c(rep("Control", 4),rep("Test", 4)),levels=c("Control", "Test"))

library(DESeq2)

(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dim(dds)
dds <- dds[rowSums(counts(dds)) > 1,]
dim(dds)
###################################
vsd <- varianceStabilizingTransformation(dds)
dists <- dist(t(assay(vsd)))
plot(hclust(dists),cex.axis= 2,cex.main=2, cex.lab=2)

####################################
dds <- estimateSizeFactors(dds)

dds <- estimateDispersions(dds)

plotDispEsts(dds, ymin = 0.01)





colSums(counts(dds))
plot(sizeFactors(dds), colSums(counts(dds)))
abline(lm(colSums(counts(dds)) ~ sizeFactors(dds) + 0))

dds <- DESeq(dds)
res <- results(dds)
resultsNames(dds)

#dds = nbinomWaldTest(dds)
#resDESeq2 = results(dds)
#resDESeq2
res
summary(res)
res1 = res[order(res$padj),]
summary(res1)
res1
mcols(res1, use.names=TRUE)
############################
#plotMA(res, main = "HSC_GRAN", ylim = c(-4, 4))



mcols(res)$description

result <- results(dds, contrast=c('condition','Control','Test'))
result <- result[complete.cases(result),]  #remove any rows with NA
head(result)
summary(result)
resultsNames(dds)

resOrdered <- res[order(res$padj),]
summary(resOrdered)
resdata <- merge(as.data.frame(resOrdered), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
names(resdata)[1] <- "gene"

head(resdata)
dim(resdata)

updated my question with one more observation

ADD REPLY • link 5.7 years ago by 1769mkc ★ 1.2k