Strange distribution of logFC
0
0
Entering edit mode
6.4 years ago
SMILE ▴ 170

Dear all,

I have RNAseq data of two groups. After doing differential gene expression using edgeR and DESeq2(the results are similar). I found the distribution of logFC is a little strange, having two peaks in the distribution.

My guess would be that most genes are equally enriched in both goups, i.e. the mode of the logFC distribution (probably Gaussian-like) is around 0.

Perhaps someone can tell you why this is so.

distribution of logFC volcanno plot

Codes can be seen bolow:

P1<-read.table("P1.STAR-counts-stranded.txt",header = TRUE)
row.names(P1) <- P1[,1]
P2<-read.table("P2.STAR-counts-stranded.txt",header = TRUE)
row.names(P2) <- P2[,1]
H3<-read.table("H3.STAR-counts-stranded.txt",header = TRUE)
row.names(H3) <- H3[,1]
H4<-read.table("H4.STAR-counts-stranded.txt",header = TRUE)
row.names(H4) <- H4[,1]
LIST<-list(P1,P2,H3,H4)
COL<-unique(unlist(lapply(LIST, colnames)))
ROW<-unique(unlist(lapply(LIST, rownames)))
TOTAL<-matrix(data=NA,nrow=length(ROW), ncol = length(COL),dimnames = list(ROW,COL))
for (ls in LIST){
  TOTAL[rownames(ls),colnames(ls)]<-as.matrix(ls)
}
write.csv(TOTAL,file="/home/Claire/Desktop/data/STAR_6/TOTAL-PH-STAR-counts-stranded.csv")

edgeR

setwd("/home/Claire/Desktop/data/STAR_6/")
library(edgeR)
HPcount<-read.csv("TOTAL-PH-STAR-counts-stranded.csv",header = TRUE,sep = ",")
HPcounts<-HPcount[,8:11]
samplenames<-c("P1","P2","H3","H4")
HPgeneid<-HPcount[,2]
rownames(HPcounts)<-HPgeneid
colnames(HPcounts) <- samplenames
HPgenes<-HPcount[,2:7]
group <- as.factor(c("P", "P", "H","H"))
DGEList <- DGEList(counts=HPcounts, group=group,genes=HPgenes)
keep <- rowSums(cpm(DGEList)>0.5) >= 2
DGEList_keep <- DGEList[keep , keep.lib.sizes=FALSE]
DGEList_keep_norm <- calcNormFactors(DGEList_keep)

design matrix

design<-model.matrix(~0+group)
colnames(design)<-levels(group)
PvsH<-makeContrasts(P-H, levels = design)

Estimating the dispersion

y<-estimateDisp(DGEList_keep_norm,design)
fit<-glmFit(y,design)

to perform likelihood ration tests

lrt<-glmLRT(fit,contrast = PvsH)

hist and volcanno plot

volcanoData <- cbind(lrt$table$logFC, -log10(lrt$table$PValue))
colnames(volcanoData) <- c("logFC", "negLogPval")
plot(volcanoData, pch=19)
hist(lrt$table$logFC,breaks=100)
rna-seq differential expressed genes edgeR • 2.7k views
ADD COMMENT
1
Entering edit mode

I'm also missing the problem here -- it's quite normal to see changes in both directions, i.e. up- as well as down-regulated genes. Is there a reason why you expect only changes into one direction?

ADD REPLY
0
Entering edit mode

How do histograms of your raw, normalised, and logged-normalised counts look? I expect to see a bimodal distribution there, too

ADD REPLY
0
Entering edit mode

What do you expect from your experiment? Is it something that is likely to change expression on thousands of genes, or just a few tens or hundreds? Do yu expect large fold-changes, or small?

ADD REPLY
0
Entering edit mode

Cross-posting is discouraged as you are asking for duplicate effort from the volunteers who answer:

https://support.bioconductor.org/p/102905/

ADD REPLY

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6