Enrichment analysis with a greater number of genes than differential expression analysis
0
0
Entering edit mode
3.3 years ago
rnaka09 ▴ 30

Hello guys, I'm analyzing the differential expression genes in 4 groups, with data from RNAseq. I made this analysis from the "Deseq2" package and this is an result example of 1 group:

#out of 20172 with nonzero total read count
#adjusted p-value < 0.1
#LFC > 0 (up)       : 20, 0.099%
#LFC < 0 (down)     : 2, 0.0099%
#outliers [1]       : 15, 0.074%
#low counts [2]     : 9575, 47%
#(mean count < 41)

After that, I went to do the enrichment analysis with the "goseq" package, with the following commands:

#Nelore - atividade
all_genes_N_atv=row.names(res_N_atv)
de_genes_N_atv=all_genes_N_atv[which(res_N_atv$padj<=0.1)]
de_N_atv_vector=c(t(de_genes_N_atv))
all_genes_N_vector_atv=c(t(all_genes_N_atv))
gene.vector.N.atv=as.integer(all_genes_N_vector_atv%in%de_N_atv_vector)
names(gene.vector.N.atv)=all_genes_N_vector_atv
table(gene.vector.N.atv)

pwf_N_atv=nullp(gene.vector.N.atv, "bosTau4",'ensGene')
go_N_atv=goseq(pwf_N_atv,'bosTau4','ensGene')
head(go_N_atv)
enriched_N_atv=go_N_atv$category[go_N_atv$over_represented_pvalue<=0.1]

capture.output(for(go in enriched_N_atv) {
  print(GOTERM[[go]])
  cat("--------------------------\n")
}, file = "enriched_N_atv.txt")

First, the organisms supported by this package list 5 genomes of the organism I study (beef cattle). The reference genome I used was more current, ARS-UCD1.2, obtained from Ensembl. However, I don't know if I really chose the right Genome for the analysis, because I didn't find which of the goseq options is the one I need, so I used "bosTau4" and "ensGene".

Second, enrichment analysis shows me more genes than what I got from differential expression analysis:

> length(enriched_N_atv)
[1] 257

> table(gene.vector.N.atv)
gene.vector.N.atv
    0     1 
27585    22

What could have happened?

goseq R RNA-Seq • 866 views
ADD COMMENT
1
Entering edit mode

The goseq output should give you the gene sets, which are significantly enriched in your differntially expressed genes, so length(enriched_N_atv) should give you the number of significant gene sets, not genes? Can you show the content of enriched_N_atv?

The advantage of GOseq is, that it is able to correct for the bias in the different length of genes of the different GO-terms. So since you already use GOseq, you could include this bias correction in the analysis during execution of the nullp function.

ADD REPLY
0
Entering edit mode

Hello!

Thanks for the info!

enriched_N_atv has:

> summary(enriched_N_atv)
   Length     Class      Mode 
      257 character character

> str(enriched_N_atv)
 chr [1:257] "GO:0001802" "GO:0001803" "GO:0001805" "GO:0001810" "GO:0001812" "GO:0001820" ...
ADD REPLY
1
Entering edit mode

Hi, so as you see, the result object enriched_N_atv holds not gene names, but gene ontology term IDs, like "GO:0001803". A gene ontology term is basically a list of genes, which belong for example to a certain biological process. In case of "GO:0001803", its longer name is "regulation of type III hypersensitivity", so it holds genes, which were in the past described to play a role in this process. Using the GOseq package correctly, this result would tell you, that genes e.g. from the gene list "GO:0001803" were significantly enriched in your list of significantly expressed genes. It seems weird, though, that you get 257 differential GO terms with only 22 differentially expressed genes.

ADD REPLY
0
Entering edit mode

Can someone help please?

ADD REPLY

Login before adding your answer.

Traffic: 1422 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6