Hello, I am new to the RNAseq. I sequenced 20 libraries from Eucalyptus. They are all different tissues: early flower, late flower pollinated, late flower unpollinated, early seed capsule, late seed capsule, mature pollen, and mature leaf. I have three biological replicates for all the tissues except pollen; I only have two for pollen.
I started my analysis using Hisat2, stringtie, and ballgown. But, I have not figured out how to do multiple pairwise comparisons in ballgown yet. Any suggestions?
Now, I have moved to using DESeq2 instead of ballgown. However, I am not sure how to design the formula for DESeq2 since I only have two replicates for pollen. The design: tissue*tree gets an error because "the model matrix is not full rank". I read on a blog post to combine tree and tissue into one. However, that makes it seems like there are no replicates. I am not sure how to go forward without replicates.
Also, I am concerned that my counts are not normalized well-enough by calculating FPKM values. My reads are single-ended. Is it possible to generate RPKM values using ballgown or DESeq2?
Last, is STAR considered a better aligner than Hisat2?
Thanks! Stef
Stringtie has a prepDY.py script that takes .gtf and converts into reads. I am not entirely sure of the algorithm used, but I use those counts for DESeq2 analysis.
Could you paste your Design matrix ? From your post, it is unclear whether the samples came from the 3 (2) same trees or from 20 different trees. If they come from different trees, then, as h.mon suggested, you should only consider the "tissue" factor in your formula.
So they come from the same trees (tree one, tree two, and tree three). I apologize about not being clear:
id tree tissue age type group
eg1MT1 one early_capsule early flowering one.early_capsule
eg1MT2 two early_capsule early flowering two.early_capsule
eg1MT3 three early_capsule early flowering three.early_capsule
eg1WT1 one mature_flower mature flowering one.mature_flower
eg1WT2 two mature_flower mature flowering two.mature_flower
eg1WT3 three mature_flower mature flowering three.mature_flower
eg3MT1 one mature_capsule mature flowering one.mature_capsule
eg3MT2 two mature_capsule mature flowering two.mature_capsule
eg3MT3 three mature_capsule mature flowering three.mature_capsule
egAT1 one early_flower_pol early flowering one.early_flower_pol
egAT2 two early_flower_pol early flowering two.early_flower_pol
egAT3 three early_flower_pol early flowering three.early_flower_pol
egL1 one mature_leaf mature vegetative one.mature_leaf
egL2 two mature_leaf mature vegetative two.mature_leaf
egL3 three mature_leaf mature vegetative three.mature_leaf
egNPT1 one early_flower_unpol early flowering one.early_flower_unpol
egNPT2 two early_flower_unpol early flowering two.early_flower_unpol
egNPT3 three early_flower_unpol early flowering three.early_flower_unpol
egP1 one mature_pollen mature flowering one.mature_pollen
egP2 two mature_pollen mature flowering two.mature_pollen
Ok, now I understand why
tissue*tree
is not working.tissue*tree
is equivalent totissue + tree + tissue:tree
(interaction). However, to be able to compute the interaction term, you would need at least two exact replicate of each (tissue-tree) couple. With your data, the best you can do is use atissue + tree
model, without the interaction term.