Question

What is the relationship between library size and normalization factor?

2

Entering edit mode

7.0 years ago

Deepak Tanwar ★ 4.2k

I saw this plot

doi.org/10.3389/fgene.2016.00164

Normalization factors for the fruit set RNA-Seq data depending on corresponding library sizes. All three studied normalization methods are carried out with default settings. For all three methods, regression (dashed) lines are estimated from a simple linear regression modeling the relationship between default normalization factors and library sizes. Color key: TMM, RLE, and MRN are respectively colored in green, blue, and red. Key to symbols: Bud, Ant, and Pos stages are respectively drawn with circles, squares, and triangles.

Question: What is the relationship between library size and normalization factor? What does it mean if the regression line have R^2 of 0.9?

Normalization library size normalization factor • 4.3k views

ADD COMMENT • link updated 5.8 years ago by elie.maza • 0 • written 7.0 years ago by Deepak Tanwar ★ 4.2k

score 1 · Answer 1 · 2017-04-03

Q: What is the relationship between library size and normalization factor?

The answer is right there if you read a bit further:

"Indeed, it is known that TMM normalization factors do not take into account library sizes. This fact is illustrated in Figure 1 by an almost horizontal regression line. On the contrary, RLE and MRN factors are closer to each other, and share a positive correlation with the library size."

Q: What does it mean if the regression line have R^2 of 0.9? A regression (linear regression here) R2 tells how good the curve (here line) fits is to your data. If all the data are on line, R2 = 100. You can also think this in term of correlation. Correlation means "how good" one variable can be predicted from another variable. In fact, the goodness of fit R^2 is numerically equal to the square of Pearson correlation (rho).

R2 = 0.9 => rho (Pearson correlation) = sqrt(0.9) = 0.94

By looking either of the numbers (R^2 or rho), you can conclude that there is a very good (linear) correlation among two variables and one can be almost perfectly predicted from other. By looking at the line (red or blue line, say), you can easily see that when one variable increases, the other too (in mathematical term, the slope of the line is +ve). This information is also conveyed by the sign (positive) of R^2.

score 0 · Answer 2 · 2018-07-12

Some normalization methods take into account the libray size in the calculation of their normalization factors, and other methods do not. That is the difference between RLE and MRN methods on the one side, and TMM an the other side. Nevertheless, the egdeR package (which uses TMM) also take into account the library size to normalize but this do not appear in their "normalization factors".

Finally, the correlation coefficient hasn't really a "biological" meaning but a "statistical" one. Indeed, it only shows that some normalization factors are linked with the library size and others are not.