DESeq2 to do differential expression analysis for multiple groups
1
0
Entering edit mode
5.2 years ago
backmoons ▴ 10

Dear all,

I got a matrix from the last step(htseq count), and I need to do differential expression using that. My species is a fish, and the aim is to find the genes related to the air-breathing organ. I have 7 time-point for development stages with two replicates, here is my code for DESeq2, but it is just for 2 time-point (two replicates), I need to do differential expression analysis with there 7 time-point, I have read the manual and it still pretty confused. I'd be really appreciated if you can give me some suggestions on how to correct the script in order to do the differential analysis for the 7-timepoint.

**

library("DESeq2")

count_tab <- read.table("deseq2_matrix",header = T,row.names = 1)

colData <- read.table("sample_info.txt",header = T) 

colData$condition = factor(colData$condition, c("A","B"))  

dds <- DESeqDataSetFromMatrix(countData = count_tab, 
                              colData=colData,
                              design = ~ condition)

dds <- DESeq(dds) 
resultsNames(dds) 
res <- results(dds, name = "condition_B_vs_A") 
res <- res[order(res$padj),]

resDF = as.data.frame(res)

resDF$gene_id = row.names(resDF)


resDF <- resDF[,c(7,1,2,3,4,5,6)] 
write.table(resDF, file = "Chan21_chan51_DESeq2_DEG", sep = "\t", quote = FALSE, row.names = FALSE)
**

And the sample_info.txt is like this:

sample_id    condition

chan2_1    A

chan2_2   A

chan5_1    B

chan5_2    B

so, if I use the 7 time-point, the sample_info2.txt will be like this: sample_id condition

chan2_1    A

chan2_2   A

chan5_1    B

chan5_2    B

chan6_1    C

chan6_2   C

chan7_1    D

chan7_2    D

chan8_1    E

chan8_2   E

chan9_1    F

chan9_2    F

chan10_1    G

chan10_2    G
R RNA-Seq sequencing • 2.9k views
ADD COMMENT
1
Entering edit mode
5.2 years ago

I think what you're looking for is a time series analysis - there are some older biostars threads around this (for example Calling differential gene expression in Time series ), here's a review paper which compares a bunch of MATLAB/R packages and finds that ImpulseDE2 works best for them: https://academic.oup.com/bib/article/20/1/288/4364840#130283249 https://bioconductor.org/packages/release/bioc/html/ImpulseDE2.html

If I understand the paper correctly, they also used DESeq2 in a way similar to you, unlike you they ran it once per pairing,so I guess in your case that would mean making 21 sample_info.txt files of each possible pair, not all possible pairs together (I may be wrong)

ADD COMMENT
0
Entering edit mode

Hi, Thanks for your kind reply! so it means I need to use "time" as another "condition"? (I read some manual but still a little confused, this will help us find the genes waved with times or..? if compared this way with normal differential expression analysis without time course.) And I also have another question is that if I can use the first day and its replicates (chan2-1.chan2-2) as the reference, then compare the left days to it? this only needs 6 compare..(I am not sure if this will work...)

Thanks again for your generous help! :)

ADD REPLY
0
Entering edit mode

If you really want to use time as the condition, yes it would be a bit like your table above - you'd have one table with time points A and B only, one table with time points A and C only, one with A and D, etc., B and C, B and D, etc. pp.,

Sure you can compare the first day with the other 6 days, you'll probably get significant DEGs, but there's so much variety in days 2-6 that any conclusions you draw will be questionable. Best to plot TPMs/FPKMs per gene over the 6 days and have a look?

ADD REPLY

Login before adding your answer.

Traffic: 2619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6