Normalizing for RNA abundance across replicates from a time course
1
0
Entering edit mode
7.1 years ago
Chloe • 0

Hi all,

I am trying to normalize my read counts for differential gene expression with edgeR

I have a set of 21 bam files from aligning my reads to a genome, corresponding to 3 replicates at each of my 7 time points.

I would like to do DGE using edgeR, but first I need to normalize for RNA abundance between replicates.

I was told I might be able to use RSEM or edgeR to produce a normalized count matrix. The issue is that my reads were generated using the QuantSeq library prep kit, so only one fragment is produced per transcript (and therefore the read count should be a direct reflection of the number of transcripts). For this reason QuantSeq recommends using HTSeq to produce a count matrix.

Is there away to produce a count matrix with HTSeq and then normalise across the replicates, without interfering with the fact that the read counts should be a direct reflection of the transcript counts? Can edgeR normalise the count matrix?

I think I have to avoid using FPKM (part of RSEM?) but I am not sure if it is appropriate to use RPKM, TMM, Upper quartile etc. I don't know much about these kinds of counts other then that they exist.

I was trying to work it out with RSEM but it doesn't seem to accept my bam files as they were produced by aligning to a genome not transcriptome

Thanks, Chloe

RNA-Seq normalization RNA abundance edgeR RSEM • 2.1k views
ADD COMMENT
1
Entering edit mode
7.1 years ago
Jake Warner ▴ 830

Hi Chloe, You can use HTseq to generate a count table and then pass it to edgeR. Then, in edgeR, you can group your samples by replicates, normalize (TMM), perform DE tests, etc. I assume you would compare each time-point to it's precedent or to T0.

For example:

#edgeR workflow:
group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7)) #group samples
y <- DGEList(counts=counts, group=group)
mean(y$samples$lib.size) #mean library size
y <- calcNormFactors(y) #TMM normalization
z <- cpm(y, normalized.lib.size=TRUE) # counts per million:
de_T1_T2 <- exactTest(y, pair=c(1,2)) #DE testing
#etc

There's a lot of good info in the edgeR vignette: https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

ADD COMMENT
0
Entering edit mode

Awesome thanks I'll give this a go

ADD REPLY

Login before adding your answer.

Traffic: 2378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6