Entering edit mode
7.7 years ago
tolgaturant
▴
20
I am going to profile a clinical RNA-seq study with 51 samples for differentially expressed genes. As described in limma-voom vignettes,I have created a DGEList object:
y1<-DGEList(counts=assays(summarizedExperiment1)$counts, genes=annotations1)
y2<-calcNormFactors(y1)
Then to explore the clustering of the samples, I have created PCA plots
plotMDS(y2, labels=resp, top=50, col=ifelse(resp=="N", "red", "blue"), gene.selection="common", prior.count=5)
There is a clear separation of samples over PC1 but I don't know the attribute that correlates with it. Should I create an attribute, as batch_1 for the 2 groups on either side of PC1 and create a model.matrix as:
mod1<-model.matrix(~batch_1+resp)
or should I just model the comparison I am interested in:
mod2<-model.matrix(~resp)
Any suggestion would be appreciated.
Tolga
Mmmh, In principle adding a batch term would be the way to go.
But are you sure it's a batch effect (let's say something technical) and not something biological that you would want to look at and understand rather than discarding? Just asking since you say that in fact you don't know where the separation is coming from, and I would want to understand what I am about to throw out.
Thank you for your answer. I agree that separation over PCs might as well be biological. But there can also be a technical effect that we don't know. I guess one cannot know without additional info. So I ended up processing the study study as is.