Hi I am working on RNA-seq differential expression analysis using DESeq2 in R. My code looks like:
countMatrix2 = as.matrix(read.table("countMatrixName.txt", header = TRUE, row.names = 1))
condition2 <- factor(rep(c("Treatment", "Control"), each=4))
colData2 <- data.frame(condition2)
rownames(colData2)<- colnames(countMatrix2)
dds2<-DESeqDataSetFromMatrix(countMatrix2, colData2, ~ condition2)
# Pre-filtering
dds2 <- dds2[rowSums(counts(dds2))>=10,]
# Setting the factor levels
dds2$condition2 <- relevel(dds2$condition2, ref = "Control")
# Differential Expression Analysis
dds_des2 <- DESeq(dds2)
# Getting result table with alpha = 0.05
res0052 <- results(dds_des2, alpha = 0.05)
From here, I want to know how many genes are expressed (for example simply count > 0) separately in Control group and Treatment group. I am confused because the result table only shows Control vs Treatment information like log2foldchange, p-value, etc. How can I separately observe each condition: Control group and Treatment group? This is my first time to use deseq2, and any thoughts are greatly helpful. Thank you!
Thanks for the information! But, when I use
counts(dds2, normalized=TRUE)
then it might still contain the data that will be removed after I apply the function DESeq() and remove NAs. Is it still a valid result? Or, do I need to remove the NAs only for the differentially expressed genes? And the number of expressed genes for each group: Condition and Treatment, should I just docounts(dds2[,1:4], normalized=TRUE)
andcounts(dds2[,5:8], normalized=TRUE)
??What exactly do you want to do? The counts are based on all genes that were in your reference that you quantified against. DESeq2 will remove genes during its independent filtering. What do you mean by valid? Considering every gene that has > 1 count as expressed in inaccurate since there is background transcription that is not biologically meaningful, alignment errors, sequencing noise etc. Which question do you want to answer?
As I addressed in my question above, I literally want to know how many genes are expressed separately in Control group and Treatment group from my code above. I also need to define the threshold for "expressed genes", for example, any gene with count > 0 or any gene with count > 0 in more than one sample. When I did
results(dds_des2, alpha = 0.05)
, I was still able to see there are a few rows with adjusted p-value = 0, which means, even after DESeq2 applies the independent filtering, it still keeps the rows having a low mean normalized count by setting its adjusted p-value = 0.cutoff for expressed genes in DESeq2 plus many other threads via the search function.