Question

Quick way to Identify batch effect from covariates?

1

Entering edit mode

7.9 years ago

James Ashmore ★ 3.4k

Given a SummarizedExperiment container, what is the quickest way to identify a batch effect from one of the covariates found within the DataFrame in the colData slot? Right now I am plotting the principal components and colouring the samples by each of the covariates. I then have to check the first few components for any separation and colour by the covariates to see which is responsible. I have a large number of libraries I have to check and was wondering if there was a Bioconductor package to perform this step? I've looked at svaseq and RUVseq but I can't see that they produce any QC plots which will tell me if an effect is present and which covariate is responsible?

bioconductor batch-effect • 3.0k views

ADD COMMENT • link updated 6 days ago by Ram 43k • written 7.9 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

I'm sure it can be done, but it's tricky with PCA since it doesn't tell you the size of the effect in absolute terms. For example, if you hand it 4 samples, (2x control 2x drug), and you get clustering not on control/drug but on batch1/batch2, it might just be that there's no effect due to the drug and a very small batch effect.

So my point is if you have a large number of libraries and you automate looking at a large number of PCAs, you can't say that experiment A had more/less batch effect than experiment B. Thus you can't quantify the batch effect of A in a meaningful way. All you can say is it has more/less of an effect than the treatment did. Conversely, that means if your treatment has a very strong effect, you can also have a lot more batch effect before it becomes apparent on the PCA.

The problem basically boils down to the fact that we can see batch effects, but we don't understand the dynamics behind what's causing it, and thus we can't quantify it or normalise it away (in a meaningful way). tl;dr, you're probably better off looking at the PCA's by eye, and judging for yourself if there's a meaningful batch effect or not, given what you know about the treatment/control/batches.

ADD REPLY • link 7.9 years ago by John 13k

score 2 · Accepted Answer · 2016-06-09

2

Entering edit mode

7.9 years ago

chris86 ▴ 400

Two methods are best used for analysis of batch effects.

PCA with annotation - as you are doing, but relies on manual visual analysis
PVCA - https://www.bioconductor.org/packages/release/bioc/html/pvca.html - this fits a mixed effects model to the principle components then looks at effects of various co variates, quantitatively. It is called principle variance components analysis.

Update: I highly recommend https://github.com/dswatson/bioplotr/blob/master/R/plot_drivers.R, this function it makes a great plot for examining for batch effects and more.