Differential gene expression by DESeq2
1
0
Entering edit mode
5.2 years ago
umeshtanwar2 ▴ 30

Hi, I am working on RNA-Seq data for Arabidopsis plant. I have samples from 4 genotypes (WT, KO1, KO2, OE) with 3 replicates for each. I used STAR for alignment with reference genome and annotation file in gff3 format. Then I obtained count.txt files for each sample by using featureCounts. Now I am doing the DGE analysis by DESeq2 as:

countdata <- read.table("counts.txt", header=TRUE, row.names=1)
countdata <- countdata[ ,6:ncol(countdata)]
colnames(countdata) <- gsub("\\.[sb]am$", "", colnames(countdata))
countdata <- as.matrix(countdata)
head(countdata)
(condition <- factor(c(rep("WT", 3), rep("KO1", 3), rep("KO2", 3), rep("OE", 3))))
library(DESeq2)
(coldata <- data.frame(row.names=colnames(countdata), condition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
dds
dds <- DESeq(dds)

I have following questions: 1. I used annotation file gff3 for alignment and for featureCounts, I converted gff3 to gtf format. Is it the right way or it would make any impact for DGE analysis? 2. I used SAM files in featureCounts. Is it okay or I should use BAM files? 3. In DESeq2 I am getting the results for each replicate individually. How can I merge the replicates for one genotype to compare with the other genotypes. I am not sure if the code I am using is right?

Any guidance from you will be very helpful.

Thank you

RNA-Seq • 1.7k views
ADD COMMENT
1
Entering edit mode
5.2 years ago
  1. This should be fine.
  2. As should this.
  3. You need a final step in your analysis, which it to get the statistical results from your analysis. This is achieved using the "results" function. As in:

    deseq_results <- results(dds, alpha=0.05, threshold=....., .....)
    

    where threshold is the minimum fold change you are interested in . You will also need to specify the contrast you are intersted in. This will depend on whether you are interested in asking if either: a) genotype has an effect on expression overall, or b) if each genotype is different from a control. See the DESeq2 manual for details.

ADD COMMENT
0
Entering edit mode

Thank you @i.sudbery. It worked. Its showing:

out of 25686 with nonzero total read count adjusted p-value < 0.05 LFC > 1.00 (up) : 627, 2.4% LFC < -1.00 (down) : 2473, 9.6% outliers [1] : 0, 0% low counts [2] : 2944, 11% (mean count < 1) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

Can you please tell me how can I extract the upregulated and downregulated genes?

ADD REPLY
0
Entering edit mode

the deseq_results object should have two columns: one for adjusted pvalues (I can't remember if its called padj or adjustedPVal off the top of my head) and one for log2FoldChange. The up genes will be those with adjustedPVal < 0.05 and log2FoldChange > 1, and the down genes will be those with adjustedPVal < 0,05 and log2FoldChange < -1.

ADD REPLY
0
Entering edit mode

Thank you @i.sudberry. It was really helpful for me.

ADD REPLY

Login before adding your answer.

Traffic: 2052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6