What is the relationship between library size and normalization factor?
2
2
Entering edit mode
7.0 years ago
Deepak Tanwar ★ 4.2k

I saw this plot

doi.org/10.3389/fgene.2016.00164

doi.org/10.3389/fgene.2016.00164

Normalization factors for the fruit set RNA-Seq data depending on corresponding library sizes. All three studied normalization methods are carried out with default settings. For all three methods, regression (dashed) lines are estimated from a simple linear regression modeling the relationship between default normalization factors and library sizes. Color key: TMM, RLE, and MRN are respectively colored in green, blue, and red. Key to symbols: Bud, Ant, and Pos stages are respectively drawn with circles, squares, and triangles.


Question: What is the relationship between library size and normalization factor? What does it mean if the regression line have R^2 of 0.9?

Normalization library size normalization factor • 4.3k views
ADD COMMENT
1
Entering edit mode
7.0 years ago

Q: What is the relationship between library size and normalization factor?

The answer is right there if you read a bit further:

"Indeed, it is known that TMM normalization factors do not take into account library sizes. This fact is illustrated in Figure 1 by an almost horizontal regression line. On the contrary, RLE and MRN factors are closer to each other, and share a positive correlation with the library size."

Q: What does it mean if the regression line have R^2 of 0.9? A regression (linear regression here) R2 tells how good the curve (here line) fits is to your data. If all the data are on line, R2 = 100. You can also think this in term of correlation. Correlation means "how good" one variable can be predicted from another variable. In fact, the goodness of fit R^2 is numerically equal to the square of Pearson correlation (rho).

R2 = 0.9 => rho (Pearson correlation) = sqrt(0.9) = 0.94

By looking either of the numbers (R^2 or rho), you can conclude that there is a very good (linear) correlation among two variables and one can be almost perfectly predicted from other. By looking at the line (red or blue line, say), you can easily see that when one variable increases, the other too (in mathematical term, the slope of the line is +ve). This information is also conveyed by the sign (positive) of R^2.

ADD COMMENT
1
Entering edit mode

Thank you Santosh Anand for your reply. I do understand what you wrote. But, what I intended to ask is, what does this mean?

I understand that there is very good (linear) correlation among two variables and on variable can be predicted from other. What's Biological interpretation?

ADD REPLY
0
Entering edit mode
5.8 years ago
elie.maza • 0

Some normalization methods take into account the libray size in the calculation of their normalization factors, and other methods do not. That is the difference between RLE and MRN methods on the one side, and TMM an the other side. Nevertheless, the egdeR package (which uses TMM) also take into account the library size to normalize but this do not appear in their "normalization factors".

Finally, the correlation coefficient hasn't really a "biological" meaning but a "statistical" one. Indeed, it only shows that some normalization factors are linked with the library size and others are not.

ADD COMMENT

Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6