Question

Normalization of RNAseq data and the use of de novo transcriptome

1

Entering edit mode

7.2 years ago

BioBing ▴ 150

Hi all,

Hoping that some of the RNAseq experts in here are having some pieces of advice on normalization/proceeding of following data analysis:

The study is about how a stressor is affecting a non-model species (no reference genome/transcriptome available) in terms of differential gene expression. We did a deep sequencing to make a reference transcriptome and sequenced the samples at a "lower depth":

Reference transcriptome de novo assembled (Trinity) from reads with a sequencing depth of 300 M (PE 2x150 nt). The statistics (TrinityStats), E50N90, BUSCO analysis, Blast2Go, Detonate (comparison of 3 assemblies - chose the best one) looks good. The reference is made from a non-stressed individual of the non-model organism.
Triplicate samples of "negative stress control", "positive stress control" and the "treatment" with a sequencing depth of 25M (PE 2x75nt)

How is the best way to use the reference transcriptome in order to determine differential gene expression of the samples? any tips/tricks on how to normalize?

Thank you!

RNA-Seq rna-seq DGE denovo transcriptome • 2.4k views

ADD COMMENT • link updated 7.1 years ago by theobroma22 ★ 1.2k • written 7.2 years ago by BioBing ▴ 150

2

Entering edit mode

I'm no expert, but you could use your reference transcriptome to map reads of your treatments and obtain counts (kallisto, or you can take the single mapping reads as counts I think), however, you will be missing out on all the isoforms specific to that treatment. You can normalize using edgeR's TMM method ( an explanation here), and I am pretty sure the way from there to determine differential expression is pretty standard (maybe look at edgeR's vignettes?).

PS- Is it E50N90 or E90N50?

ADD REPLY • link 7.2 years ago by biofalconch ★ 1.1k

1

Entering edit mode

Ops, I meant E90N50 :-)

Thank you! I have considered kallisto as well

ADD REPLY • link 7.2 years ago by BioBing ▴ 150

1

Entering edit mode

The Trinity wiki provides a lot of guidance for exactly what you want to do: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Post-Transcriptome-Assembly-Downstream-Analyses

Since you will be aligning to your transcriptome you will want to rescue multi-mapped reads. The Trinity developers recommend Kallisto, Salmon or RSEM.

As a personal note, my workflow is to map to the assembly using bowtie, estimate abundance using RSEM and then normalization and differential testing using edgeR's TMM method.

ADD REPLY • link 7.1 years ago by Jake Warner ▴ 830

score 2 · Answer 1 · 2017-03-09

2

Entering edit mode

7.1 years ago

theobroma22 ★ 1.2k

You can use Rsubread, and this will tie into limma/ edgeR so you can normalize using zoom or limma-trend, depending on the library sizes. Then, test for differential expression. I am also no expert but was able to use this pipeline successfully to do exactly what you are trying to do. Hope this helps.

ADD COMMENT • link 7.1 years ago by theobroma22 ★ 1.2k