Question

edgeR RNA-seq Results Across Conditions

0

Entering edit mode

6.7 years ago

asyndeton17 ▴ 40

Hi,

I recently performed RNA-seq differential expression analysis. I first did a comparison between Condition 2 vs. Control. Then I did Condition 3 vs. Control. However, the raw counts do not seem to be substantiating the output data from edgeR. For instance, the Krt17 gene supposedly has a log fold change of 7 in the comparison between Condition 2 vs. Control, and the Krt17 gene does not show up on the list of differentially expressed genes (at an FDR of 0.05 or lower) for Condition 3 vs. Control. Yet, when I go back to the raw counts for the Krt17 gene, this is what I have: Control replicates(0, 0, 0), Condition 2 (25, 27, 25), and Condition 3 (16, 14, 11). Clearly, it seems to be "overexpressed" in both, but it doesn't show up in the other list. How do I fix this or account for such an observation?

For those wondering, I ran my analysis following most of what the Griffith Lab tutorial suggests.

Thanks

RNA-Seq edgeR • 1.7k views

ADD COMMENT • link updated 6.7 years ago by shoujun.gu ▴ 380 • written 6.7 years ago by asyndeton17 ▴ 40

0

Entering edit mode

Try normalizing the data to a linear scale first then fit a model with limma.

#filter lowly expressed
y <- DGEList(counts = raw_counts)
y <- y[!rowSums(y$counts == 0) == ncol(raw_counts),] 
A <- aveLogCPM(y)
y2 <- y[A>1,]

#library size factor, quantile, and batch normalizations
design <- model.matrix(~0+group)
cont.matrix <- makeContrasts(group1 - group2, 
    group2 - group3, group1 - group3, levels = design)
y3 <- calcNormFactors(y2, method = "TMM")
dge <- voomWithQualityWeights(y3, design=design, 
    normalization="quantile", plot=FALSE)
rbe <- removeBatchEffect(dge, batch)

#fit to linear model
fit <- lmFit(dge, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit3 <- eBayes(fit2, trend = TRUE, robust = TRUE)

#examine results
topTable(fit3, coef=1, num=Inf, sort.by = "P", adjust = "BH")
topTable(fit3, coef=2, num=Inf, sort.by = "P", adjust = "BH")
topTable(fit3, coef=3, num=Inf, sort.by = "P", adjust = "BH")

ADD REPLY • link 6.7 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Would it work if I filtered lowly expressed counts before creating the edgeR object? If so, what sort of cutoff should I make?

ADD REPLY • link 6.7 years ago by asyndeton17 ▴ 40

0

Entering edit mode

It's already taken care of in the code. Specifically, the number of instances a gene / transcript feature has a sample count greater than 0 and the average CPM across all samples is atleast 1. Usually, if you used the same seq machine to run all of your samples, you will use the sequencing lane as factor for batch, and this can scale out to include multiple sequencing machines from different institutions, etc.

ADD REPLY • link 6.7 years ago by mforde84 ★ 1.4k

0

Entering edit mode

Can you post your code here?

ADD REPLY • link 6.7 years ago by moxu ▴ 510

0

Entering edit mode

For the most part, I am following the Griffith Lab Tutorial: https://github.com/griffithlab/rnaseq_tutorial/blob/master/scripts/Tutorial_Module4_Part4_edgeR.R

ADD REPLY • link 6.7 years ago by asyndeton17 ▴ 40

score 0 · Answer 1 · 2017-08-14

0

Entering edit mode

6.7 years ago

shoujun.gu ▴ 380

is its FDR too big? or you may check your filter criteria to see whether this gene is filtered by not enough reads ?

ADD COMMENT • link 6.7 years ago by shoujun.gu ▴ 380