Question

Surrogate variable analysis: ask for a practical example

0

Entering edit mode

6.0 years ago

Mozart ▴ 330

Hello there, first of all I want to thanks all the community to support my silly questions for months and in particular one of the "big guys", i.e. Kevin, who helped me a lot with many many issues (I think he deserves a statue!)...

I am here after months because I am basically trying to read as much as possible to avoid cloning posts on same identical topic. There are many (probably too much) posts that ask about clarification about the surrogate variable analysis. Honestly, I am not a bioinformaticians, I didn't fully understand the original article (especially the formulas) that define the theory behind the SVA. With the help of Kevin I basically understood how to work out the variables but it is not fully clear to me "what", "how much" and "why" the software corrects this noise. How can it guess the right gene to correct?

I 'try' to make an example (but I don't think the experts here will like it; I am sure!):

I have the same gene (a) in 3(wt)+3(ko) different samples (); so I have:

gene a1, gene a2, gene a3 (wt);
gene a4, gene a5, gene a6 (ko);

now, let's say that we have the following tpm counts

gene a1=0.23
gene a2=0.29
gene a3=0.98
gene a4=0.95
gene a5=1.2
gene a6=0.99

clearly there's something unexpected(?) that the sva can try to correct (?)

thanks for your help.

Mozart

RNA-Seq sva • 3.2k views

ADD COMMENT • link updated 6.0 years ago by Ram 43k • written 6.0 years ago by Mozart ▴ 330

0

Entering edit mode

Hi Mozart, thank you for your comments

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

score 4 · Accepted Answer · 2018-04-26

4

Entering edit mode

6.0 years ago

andrew.j.skelton73 6.5k

Surrogate variables, batch effects, technical effects in general are better seen as an experiment wide visualisation, such as with a PCA - One gene is not indicative of unexpected variation. The example you outline could be anything, sample 3 may be an outlier, or it could be completely expected in that gene a's wt variation is naturally high spread. I'd also suggest reading my answer here to get a more broad overview of how to correct technical effects with design matrices - What you can and can't do.

ADD COMMENT • link 6.0 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Hi Andrew, Thank you for your comments and explanations. I have limited experience using the SVA package and am not confident with my results. Otherwise, I have been looking for some comments about the use of the BEclear package. Have you used this package? The Akulenko et al., 2016 paper "BEclear:Batch Effect Detection and Adjustment in DNA Methylation Data" was very interesting and yet I haven't found comments on the forum. Any input would be appreciated.

ADD REPLY • link 6.0 years ago by jonellevillar • 0

0

Entering edit mode

Thanks for your reply. After having read your post, I am still not sure of what I am doing, sorry. A more practical example, with numbers, would help me to understand a bit better I think. How the 'numbers are corrected' I have no idea.

ADD REPLY • link 6.0 years ago by Mozart ▴ 330

1

Entering edit mode

While not applicable to SVA specifically, additive models for correction of nuisance variables was something that peaked my interest a while ago. I got a fantastic example from Aaron Lunn on Bioconductor support that might help you understand here

ADD REPLY • link 6.0 years ago by andrew.j.skelton73 6.5k

1

Entering edit mode

I guess you could say that programs like SVA look for 'patterns' or 'signatures' of differences across your samples, patterns / signatures that could be reflective of bias for which adjustments should be made. Things like sequencing depth, for example, affect samples in a consistent manner by increasing/decreasing the number of raw counts. Other things that would leave consistent 'footprints' in your data could include:

a few RNA samples were left too long in a truck on a hot day (degrades these sample slightly)
a few RNA sample dilutions miscalculated, as one technician was off sick one day and his/her replacement had less experience
the RNA -> cDNA step was left too short or too long for only a few samples
et cetera

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank so much for the incredible support, guys..! Have a great weekend...

Mozart

ADD REPLY • link 6.0 years ago by Mozart ▴ 330