Question

Normalization scaling factors: formula for applying them to raw counts

1

Entering edit mode

5.9 years ago

sovrappensiero ▴ 90

Hello,

I am using both edgeR and DESeq2 to normalize raw counts (it's not RNA-seq data or 16S amplicon seq data...but it is amplicon seq data). I just need to normalize them before creating a visualization. It's preliminary work; so the parts of these packages that calculate differential expression are not useful to me.

I have two sets of scaling factors (from edgeR using the TMM and RLE methods). My question is what is the correct approach for applying these scaling factors to my raw counts. Is it:

raw count / scaling factor

or

raw count / (library size * scaling factor)

I've been researching these methods and so far I have seen it both ways. I'm still not sure how to just get normalization factors from DESeq2, as I just got that package installed yesterday evening. But I've kept the DESeq2 tag because the question applies to both and if anyone has advice regarding DESeq2 that could be helpful to me and others.

Rookie question: the dispersion calculation would make sense for evaluating DE, not as part of the normalization, right?

Thanks for the help.

normalization edgeR DESeq2 R • 9.8k views

ADD COMMENT • link 5.9 years ago by sovrappensiero ▴ 90

score 3 · Answer 1 · 2018-05-26

3

Entering edit mode

5.9 years ago

Kevin Blighe 87k

To normalise, you do just divide by the size factor (assuming that you have arrived at your size factors in the correct way). This is exemplified in a good example here: Normalization

To obtain the DESeq2 normalisation factors in the first place, you could just first normalise the data in DESeq2 and then use: sizeFactors(dds) This is stated in the vignette: Analyzing RNA-seq data with DESeq2

For dispersion, take a look at my answer here: A: Clarification on how DSEeq2 Dispersion Curve is Generated I am almost certain that dispersion is indeed used for DE analysis.

ADD COMMENT • link 5.9 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you! That was very helpful.

ADD REPLY • link 5.9 years ago by sovrappensiero ▴ 90

0

Entering edit mode

@Kevin: Is this method still valid for scale factors generated by upper quartile or scaled median normalization? Are RLE and median of ratios described in your link the same calculation? Same question for median and scaled median methods?

ADD REPLY • link 5.9 years ago by user31888 ▴ 130

0

Entering edit mode

I cannot say that each normalisation method just involves a division by a particular size factor - each has a different formula that may or may not involve a 'size factor'.

From what I understand, the median ratios method is an extension of RLE, and is currently the method used by DESeq2, as per the link that I gave. For 100% clarification, would suggest re-posting your question on the Bioconductor forum where the DESeq2 developers are more likely to respond.

A good practice would be to calculate the size factors manually and then via DESeq2, and then you'll have empirical evidence of how exactly it works.

ADD REPLY • link 5.9 years ago by Kevin Blighe 87k