Question

Estimating individual batch factors

0

Entering edit mode

5.3 years ago

vmax • 0

I’m planning to analyse RNA-seq data that was obtained from a few different batches. I’ll use batch as a covariate in my model, but I would still like to estimate the impact of known, individual batch factors (e.g. read length, read depth, # of mapped reads, prep techniques). Initially, I thought to run an ANOVA, but a colleague mentioned PVCA. Is there a standard or preferred method for this?

rna-seq • 789 views

ADD COMMENT • link updated 5.3 years ago by Kevin Blighe 87k • written 5.3 years ago by vmax • 0

0

Entering edit mode

Why do you not just test the batch factors in your model? Then you can see how many genes are affected by each feature. And you might want to throw GC content into the mix - it's a well known thing that changes with batch.

ADD REPLY • link 5.3 years ago by Kristoffer Vitting-Seerup ★ 4.0k

score 1 · Answer 1 · 2018-12-22

I would just start with processing of the data as normal, with batch as covariate, as you mention, and then just visually checking for differences based on read length, depth, etc via PCA bi-plots. DESeq2's plotPCA() function easily allows you to generate bi-plots and shade them according to factors of interest. If you note a difference in any case, you can then decide what to do after that. You could look at sva, in this regard.

One thing you would not want to do is include too many covariates in your model formula. It is generally your responsibility to ensure that sources of bias / variation are controlled via the experimental design.

Kevin