Question

Building a biologically meaningful design matrix for limma/voom in a large RNASeq experiment

0

Entering edit mode

5.8 years ago

stefanos.bamopoulos ▴ 40

Hello guys,

I have a question regarding differential expression (DE) with limma/voom in an RNA-Seq experiment of ~250 samples. I have 4 different point mutations that are mutually exclusive and I wish to identify DE isoforms specific for each mutation. Additionally, I have reason to believe that the mutations cause similar differential expression for some isoforms and I also wish to identify those.

My questions are:

What is the best approach to identify the differences and the similarities of those mutations?
How do I adress the issue of confounding?

Specifically, I have 3, 4, 6 and 17 samples for each mutation respectively, as well as ~200 samples as a control group.

The approach I have come up with so far is to:

1) perform a DE analysis for each mutation comparing it to the control group (which does not include the other mutations), like this:

design.matrix <- model.matrix(~ factor(mut1), data)

design.matrix <- model.matrix(~ factor(mut2), data)... and so on

2) after doing this for each of the 4 mutations, combine them into a single binary variable and perform a DE analysis comparing all of them against the control group:

design.matrix <- model.matrix(~ factor(all.mutations), data

My way of thinking is that since the mutations are mutually exclusive, I can compare each mutation against the control group (which does not include the other mutations) in order to identify DE isoforms specific to each mutation. Afterwards by combing them, I hope to highlight isoforms that are similarly DE for all mutations. If I check the overlap of the last analysis with the preceding 4 I should be able to identify at least some isoforms that are affected in a common way. Is it maybe sufficient to just compare the overlap of the first 4 analyses without the 2nd step?

As to the 2nd question. Is there a rule of thumb as to how many confounders I can add in a limma analysis while avoiding overfitting? Since in a differential expression analysis we don't really have "events" I am unsure how to determine the number of confounders I can adjust for. Especially for the mutation with the smallest subset (only 3 mutations) I am unsure if the relatively large control group of ~200 samples permitts me to adjust for multiple confounders.

Any input is welcome, thanks in advance!

Stefan

limma voom RNA-Seq confounding • 3.3k views

ADD COMMENT • link updated 5.0 years ago by Gordon Smyth ★ 7.0k • written 5.8 years ago by stefanos.bamopoulos ▴ 40

1

Entering edit mode

Since the mutations are mutually-exclusive I'd create a vector Muts, in which the mutation status is specified for each subject, i.e.:

Muts <- c("Mut1", "Mut1", "Mut2", .... , "Control", ...)

I would create a design matrix as below:

Mutations <- factor(Muts, levels = c("Mut1", "Mut2", ... , "Control"))
design <- model.matrix(~0+Mutations)
colnames(design) <- levels(Mutations)

Then fit the linear model:

fit <- lmFit(eset, design)

Then extract the contrasts:

cont_mat <- makeContrasts(
"WT-mut1",
"WT-mut2", ..., 
levels=design)
fit2 <- contrasts.fit(fit, cont_mat)
fit2 <- eBayes(fit2)
topTableF(fit2)

ADD REPLY • link 5.8 years ago by egeulgen ★ 1.3k

1

Entering edit mode

You might also ask this question directly at the BioC. Gordon Smyth (as well as other authors and maintainers of popular packages) are outstandingly responsive, especially to well-written questions like yours.

ADD REPLY • link 5.8 years ago by ATpoint 81k

0

Entering edit mode

Thank you! I will try this as well

ADD REPLY • link 5.8 years ago by stefanos.bamopoulos ▴ 40

ATpoint · Answer 1 · 2019-04-08

0

Entering edit mode

5.0 years ago

Gordon Smyth ★ 7.0k

Cross-posted https://support.bioconductor.org/p/110309/

ADD COMMENT • link updated 5.0 years ago by ATpoint 81k • written 5.0 years ago by Gordon Smyth ★ 7.0k