Proper design for DESeq2 and other RNAseq general questions
2
0
Entering edit mode
5.6 years ago
Stef • 0

Hello, I am new to the RNAseq. I sequenced 20 libraries from Eucalyptus. They are all different tissues: early flower, late flower pollinated, late flower unpollinated, early seed capsule, late seed capsule, mature pollen, and mature leaf. I have three biological replicates for all the tissues except pollen; I only have two for pollen.

I started my analysis using Hisat2, stringtie, and ballgown. But, I have not figured out how to do multiple pairwise comparisons in ballgown yet. Any suggestions?

Now, I have moved to using DESeq2 instead of ballgown. However, I am not sure how to design the formula for DESeq2 since I only have two replicates for pollen. The design: tissue*tree gets an error because "the model matrix is not full rank". I read on a blog post to combine tree and tissue into one. However, that makes it seems like there are no replicates. I am not sure how to go forward without replicates.

Also, I am concerned that my counts are not normalized well-enough by calculating FPKM values. My reads are single-ended. Is it possible to generate RPKM values using ballgown or DESeq2?

Last, is STAR considered a better aligner than Hisat2?

Thanks! Stef

RNA-Seq DESeq2 Hisat2 Stringtie • 1.6k views
ADD COMMENT
0
Entering edit mode

Stringtie has a prepDY.py script that takes .gtf and converts into reads. I am not entirely sure of the algorithm used, but I use those counts for DESeq2 analysis.

ADD REPLY
0
Entering edit mode

Could you paste your Design matrix ? From your post, it is unclear whether the samples came from the 3 (2) same trees or from 20 different trees. If they come from different trees, then, as h.mon suggested, you should only consider the "tissue" factor in your formula.

ADD REPLY
0
Entering edit mode

So they come from the same trees (tree one, tree two, and tree three). I apologize about not being clear:

id tree tissue age type group

eg1MT1 one early_capsule early flowering one.early_capsule

eg1MT2 two early_capsule early flowering two.early_capsule

eg1MT3 three early_capsule early flowering three.early_capsule

eg1WT1 one mature_flower mature flowering one.mature_flower

eg1WT2 two mature_flower mature flowering two.mature_flower

eg1WT3 three mature_flower mature flowering three.mature_flower

eg3MT1 one mature_capsule mature flowering one.mature_capsule

eg3MT2 two mature_capsule mature flowering two.mature_capsule

eg3MT3 three mature_capsule mature flowering three.mature_capsule

egAT1 one early_flower_pol early flowering one.early_flower_pol

egAT2 two early_flower_pol early flowering two.early_flower_pol

egAT3 three early_flower_pol early flowering three.early_flower_pol

egL1 one mature_leaf mature vegetative one.mature_leaf

egL2 two mature_leaf mature vegetative two.mature_leaf

egL3 three mature_leaf mature vegetative three.mature_leaf

egNPT1 one early_flower_unpol early flowering one.early_flower_unpol

egNPT2 two early_flower_unpol early flowering two.early_flower_unpol

egNPT3 three early_flower_unpol early flowering three.early_flower_unpol

egP1 one mature_pollen mature flowering one.mature_pollen

egP2 two mature_pollen mature flowering two.mature_pollen

ADD REPLY
0
Entering edit mode

Ok, now I understand why tissue*tree is not working. tissue*tree is equivalent to tissue + tree + tissue:tree (interaction). However, to be able to compute the interaction term, you would need at least two exact replicate of each (tissue-tree) couple. With your data, the best you can do is use a tissue + tree model, without the interaction term.

ADD REPLY
1
Entering edit mode
5.6 years ago
h.mon 35k

From the description of your experiment, I think your design should be only tissue, why are you including tissue*tree? Having two replicates for pollen is not ideal, but shouldn't cause errors.

Also, I am concerned that my counts are not normalized well-enough by calculating FPKM values. My reads are single-ended. Is it possible to generate RPKM values using ballgown or DESeq2?

Your counts won't be properly normalized with both FPKM and RPKM (and, for single-ends reads, RPKM is the same as FPKM). A better within sample normalization is TPM, which ballgown calculates. For DESeq2, you don't need these normalizations, it expect raw counts as input.

Last, is STAR considered a better aligner than Hisat2?

I think so, but not better enough to warrant realigning your reads

ADD COMMENT
0
Entering edit mode

Ok. I will stick to DESeq2 then because I am not sure if ballgown does RPKM. I think they only normalize with FPKM or gene coverage.

ADD REPLY
1
Entering edit mode
5.6 years ago

20 libraries So that's 7 trees, six tissues from each, with one of the pollen samples dropping out for some reason, right?

However, I am not sure how to design the formula for DESeq2 since I only have two replicates for pollen.

That's not the problem. Your design is just "tissue". It's all you can do. You can't model differences between trees with only a single tissue sample per tree. If you'd taken 4 flowers of each type from each tree, then you could.

And pretty much no one uses FPKM anymore. DESeq2 takes raw gene counts. It will do its own normalization.

ADD COMMENT

Login before adding your answer.

Traffic: 3131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6