Question about sva + edgeR to identify differentially expressed genes
0
1
Entering edit mode
5.1 years ago
tujuchuanli ▴ 100

Hi,

I want to identify differential genes (DEG) in TCGA dataset (cancer samples vs normal samples) by combination of edgeR and sva (get rid of batch effect by sva). The following is my code and I am not sure if it is ok or not. Could you please help me to check the code and give me some suggestions?

Thanks.

I mainly referred to the following three posts and the manual of these two packages.

  1. https://support.bioconductor.org/p/76381/

  2. Batch effect in DESeq2 - multiple factor or SVA?

  3. sva + egdeR - differential expression analysis - RNA-seq data

Below is my code:

the first step is constructing file and filtering out lowly expressed genes. expres.filter is the matrix of reads counts for each gene in each sample. In this matrix, each row is a gene and each column is a sample.

y=DGEList(counts=exprs, genes=rownames(exprs))
keep <- rowSums(cpm(y)>0.1) >= 20
y <- y[keep, , keep.lib.sizes=FALSE]
y$samples$lib.size <- colSums(y$counts)
y=calcNormFactors(y)
y.cpm.values <- cpm(y$counts, log = F)

the second step is batch effect detection by sva, there are 120 cancer samples and 20 normal samples in my dataset.

Tissue <- factor(c(rep("T",120),rep("N",20))
mod <- model.matrix(~Tissue, data=Tissue)
mod0 <- model.matrix(~1, data=Tissue)
svseq = svaseq(y.cpm.values, mod, mod0)
design <- cbind(mod, svseq$sv)

the third step is identifing DEG by edgeR

y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design)
lrt <- glmTreat(fit,lfc=log2(1.5))

Finally, I get my DEG list.

sva edgeR • 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6