Biostar Beta. Not for public use.
Question: Surrogate variable analysis: ask for a practical example
0
Entering edit mode

Hello there, first of all I want to thanks all the community to support my silly questions for months and in particular one of the "big guys", i.e. Kevin, who helped me a lot with many many issues (I think he deserves a statue!)...

I am here after months because I am basically trying to read as much as possible to avoid cloning posts on same identical topic. There are many (probably too much) posts that ask about clarification about the surrogate variable analysis. Honestly, I am not a bioinformaticians, I didn't fully understand the original article (especially the formulas) that define the theory behind the SVA. With the help of Kevin I basically understood how to work out the variables but it is not fully clear to me "what", "how much" and "why" the software corrects this noise. How can it guess the right gene to correct?

I 'try' to make an example (but I don't think the experts here will like it; I am sure!):

I have the same gene (a) in 3(wt)+3(ko) different samples (); so I have:

``````gene a1, gene a2, gene a3 (wt);
gene a4, gene a5, gene a6 (ko);
``````

now, let's say that we have the following tpm counts

``````gene a1=0.23
gene a2=0.29
gene a3=0.98
gene a4=0.95
gene a5=1.2
gene a6=0.99
``````

clearly there's something unexpected(?) that the sva can try to correct (?)

Mozart

Entering edit mode
0

Kevin Blighe
43k
4
Entering edit mode

Surrogate variables, batch effects, technical effects in general are better seen as an experiment wide visualisation, such as with a PCA - One gene is not indicative of unexpected variation. The example you outline could be anything, `sample 3` may be an outlier, or it could be completely expected in that `gene a`'s `wt` variation is naturally high spread. I'd also suggest reading my answer here to get a more broad overview of how to correct technical effects with design matrices - What you can and can't do.

Entering edit mode
0

Hi Andrew, Thank you for your comments and explanations. I have limited experience using the SVA package and am not confident with my results. Otherwise, I have been looking for some comments about the use of the BEclear package. Have you used this package? The Akulenko et al., 2016 paper "BEclear:Batch Effect Detection and Adjustment in DNA Methylation Data" was very interesting and yet I haven't found comments on the forum. Any input would be appreciated.

jonellevillar
• 0
Entering edit mode
0

Thanks for your reply. After having read your post, I am still not sure of what I am doing, sorry. A more practical example, with numbers, would help me to understand a bit better I think. How the 'numbers are corrected' I have no idea.

Mozart
• 130
Entering edit mode
1

While not applicable to SVA specifically, additive models for correction of nuisance variables was something that peaked my interest a while ago. I got a fantastic example from Aaron Lunn on Bioconductor support that might help you understand here

andrew.j.skelton73
5.7k
Entering edit mode
1

I guess you could say that programs like SVA look for 'patterns' or 'signatures' of differences across your samples, patterns / signatures that could be reflective of bias for which adjustments should be made. Things like sequencing depth, for example, affect samples in a consistent manner by increasing/decreasing the number of raw counts. Other things that would leave consistent 'footprints' in your data could include:

• a few RNA samples were left too long in a truck on a hot day (degrades these sample slightly)
• a few RNA sample dilutions miscalculated, as one technician was off sick one day and his/her replacement had less experience
• the RNA -> cDNA step was left too short or too long for only a few samples
• et cetera
Kevin Blighe
43k
Entering edit mode
0

Thank so much for the incredible support, guys..! Have a great weekend...

Mozart