SizeFactors for BAM normalization
1
2
Entering edit mode
3.7 years ago
Anand Rao ▴ 630

My goal is to make PCA and correlation plots of my RNA-Seq BAM files. Some useful discussion on BioStars such as this, have helped guide my steps.

In another post, responding to a question on library size normalization at this BioStars post, user ATpoint indicates size factor calculation must be performed as follows:

## edgeR:: calcNormFactors
tmp.NormFactors <- calcNormFactors(object = raw.counts, method = c("TMM"), doWeighting = FALSE)

## raw library size:
tmp.LibSize <- colSums(raw.counts)

## calculate size factors:
SizeFactors <- tmp.NormFactors * tmp.LibSize / 1000000

In my analyses, I used DESeq2 instead of edgeR, after importing SALMON quantification using tximport, using syntax instructions at BioConductor, as follows:

library(DESeq2)
Design <- DataFrame((cbind(BiolRep, Genotype, TimePoints)))
dim(Design)
#[1] 144   3
rownames(Design) <- colnames(txi.salmon$counts)
design_formula <- ~ TimePoints * Genotype

dds <- DESeqDataSetFromTximport(txi.salmon, Design.df, design_formula)
NormValues <- estimateSizeFactorsForMatrix(counts(dds))

So my 1st question is this:

To use DESeq2-based size Factors for converting BAM to BigWig, using bamCoverage of deepTools, I would still need to calculate SizeFactors as follows, rather than use just the (inverse of the) NormValues, am I right?

SizeFactors <- NormValues * LibSize / 1000000

And my 2nd question is :

With SizeFactors calculated as above, I'd then have to use the inverse of those values to obtain my final normalized BAM files as inputs for use with deepTools, with the following syntax, am I right?

bamCoverage -b $BAM_IN -o $BigWig_OUT --normalizeUsing None --scaleFactor $(1/Size_factor) --effectiveGenomeSize $ACGTtotalCount

Could you please confirm or correct the approach I have indicated above? Thanks in advance!

deepTools bamCoverage sizeFactors DESeq2 • 2.6k views
ADD COMMENT
5
Entering edit mode
3.7 years ago

Please use 1/calcNormFactors(object = raw.count) as the scaling factor. Whether you use TMM or the default RLE is largely immaterial to me. Your bamCoverage command looks fine.

ADD COMMENT
0
Entering edit mode

Thanks, Devon. Just to be doubly sure I understood you right, LibSize is not relevant or factored into the sizeFactor value, just the calcNormFactors values, yes? (i.e. before it's inverse is used with bamCoverage)

ADD REPLY
2
Entering edit mode

Correct, you don't need to account for library size.

ADD REPLY
1
Entering edit mode

Yes that is true, as Devon says. The DESeq2 factors already have the lib.size-part incorporated while in edgeR you have to calculate it manually.

ADD REPLY
0
Entering edit mode

Thank you for confirming _/\_

ADD REPLY
0
Entering edit mode

On a related topic - for multiBigwigSummary, is it possible to specify --bwfiles and --labels as 2 text files containing the respective lists, rather than explicitly at the command line? I have ~ 150 input BW files, so syntax clarity may become an issue, hence this query. This is a very minor issue though, if I can even call it that :) TIA!

ADD REPLY
1
Entering edit mode

No, there's no way to feed the file names in via a file, since we kind of assume that anyone handling that many files is using something like snakeMake to automatically generate the command. As an aside, it's tough to interpret any plots with that many samples.

ADD REPLY
0
Entering edit mode

I agree - the plotPCA and heatmap images generated were hard to interpret, I had to use much smaller and meaningful subsets to be able to 'see' anything. Thanks very much for your help.

ADD REPLY

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6