first of all I want to thanks all the community to support my silly questions for months and in particular one of the "big guys", i.e. Kevin, who helped me a lot with many many issues (I think he deserves a statue!)...
I am here after months because I am basically trying to read as much as possible to avoid cloning posts on same identical topic. There are many (probably too much) posts that ask about clarification about the surrogate variable analysis.
Honestly, I am not a bioinformaticians, I didn't fully understand the original article (especially the formulas) that define the theory behind the SVA. With the help of Kevin I basically understood how to work out the variables but it is not fully clear to me "what", "how much" and "why" the software corrects this noise. How can it guess the right gene to correct?
I 'try' to make an example (but I don't think the experts here will like it; I am sure!):
I have the same gene (a) in 3(wt)+3(ko) different samples (); so I have:
gene a1, gene a2, gene a3 (wt);
gene a4, gene a5, gene a6 (ko);
now, let's say that we have the following tpm counts
clearly there's something unexpected(?) that the sva can try to correct (?)
thanks for your help.