Question

Normalisation before edgeR for RNA-Seq

0

Entering edit mode

3.1 years ago

ZheFrench ▴ 570

I am more a DESeq2 user and switch to edgeR recently. I received scripts from other dev. With DESeq I used to directly inject raw counts...Here the guy pre-normalise count using, is that ok ?

Is this double normalization because I think edgeR intrinsically normalize reads, right ? So I was wondering if I should remove this part of code before edgeR call. What do you think ?

Roughly :

  ###### Useless section ? ######
  q <- apply(counts,2,function(x) quantile(x[x>0],prob=0.75))
  ncounts <- sweep(counts,2,q/median(q),"/")
  ################################# Should I just use counts ?

  dge <- DGEList(ncounts,genes=rownames(ncounts))
  design <- model.matrix(~0+dge$group) # no intercept #x0 = 1,  force model throught the origin
  colnames(design) <- gsub("^dge$group","",colnames(design))
  cm <- makeContrasts(contrasts=comp,levels=dge$group)

  y     <- estimateDisp(dge,design,robust=T)
  fit.y <- glmFit(y,design)
  lrt   <- glmLRT(fit.y,contrast=cm)

rna-seq edgeR • 829 views

ADD COMMENT • link updated 3.1 years ago by ATpoint 82k • written 3.1 years ago by ZheFrench ▴ 570

score 1 · Answer 1 · 2021-04-15

The manual instructs to use the raw counts. https://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Normalization only happens if you use calcNormFactors, otherwise a plain per-million scaling is performed which does not correct for library composition. I would strictly stick to the manual if in doubt. This custom code on top from your colleague should probably be ignored. There is a "quick start" section in the manual you can use for a simple analysis, be sure to use calcNormFactors and do not use prenormalized counts.