Mapping mouse gene symbols to Entrez IDs in GAGE
1
0
Entering edit mode
8.4 years ago
bojingjia ▴ 10

I've come across many posts about common errors using GAGE, and many of these common pitfalls relate to mismatching ID systems (Entrez gene ID, gene symbol, etc). I've read the "Gene set and data preparation" vignette, but still get errors when I try to convert my gene symbols to Entrez IDs.

I have two questions:

  1. Is there a way to map more "efficiently" gene symbols to Entrez IDs? For example, of 38720 unique input IDs, 8850 of my genes remain unmapped. I am using the mouse data set, trying to map gene symbols in my featureCounts output.
  2. What does it really mean when I fail to download xml/png files for my GAGE analysis? I get errors like:
Info: Downloading xml files for hsammu04060, 1/1 pathways..
Warning: Download of hsammu04060 xml file failed!
This pathway may not exist!

Thanks in advance

RNA-Seq DESeq2 GSEA GAGE pathview • 5.9k views
ADD COMMENT
0
Entering edit mode
## Load required libraries
library("DESeq2")
library("gage")
library("pathview")

## Combine count files into dataframe
# Import data from featureCounts
countdata <- read.table("wt_CEvsRT.txt", header=TRUE, row.names=1)

# Convert to matrix
countdata <- as.matrix(countdata)
head(countdata)

# Assign condition
sampleCondition <- c("RT", "RT", "RT", "CE", "CE", "CE")

# Analysis with DESeq2 ----------------------------------------------------
# Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix
(coldata <- data.frame(row.names=colnames(countdata), sampleCondition))
dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~sampleCondition)

## Run DESeq normalization
dds<-DESeq(dds)

##from GAGE

deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc
out.suffix="deseq2"

require(gage)
datakegg.gs)

#get the annotation files for mouse

kg.mouse<- kegg.gsets("mouse")
kegg.gs<- kg.mouse$kg.sets[kg.mouse$sigmet.idx]

#convert gene symbol to entrez ID

gene.symbol.eg<- id2eg(ids=names(exp.fc), category='SYMBOL', org='Mm')

names(exp.fc)<- gene.symbol.eg[,2]

fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref = NULL, samp = NULL)
sel <- fc.kegg.p$greater[, "q.val"] < 0.2 & !is.na(fc.kegg.p$greater[, "q.val"])
path.ids <- rownames(fc.kegg.p$greater)[sel]
sel.l <- fc.kegg.p$less[, "q.val"] < 0.2 & !is.na(fc.kegg.p$less[,"q.val"])
path.ids.l <- rownames(fc.kegg.p$less)[sel.l]
path.ids2 <- substr(c(path.ids, path.ids.l), 1, 8)
require(pathview)
#view first 3 pathways as demo
pv.out.list <- sapply(path.ids2[1:3], function(pid) pathview(gene.data = exp.fc, pathway.id = pid,species = "hsa", out.suffix=out.suffix))
ADD REPLY
1
Entering edit mode

I don't know if it is the cause of all your problems, but you should be using species = "mmu" on your pathview() call.

ADD REPLY
0
Entering edit mode

Thanks! That solved the errors. I am still unable to completely map all the gene symbols, do you have any suggestions?

ADD REPLY
0
Entering edit mode

No, I do not have any (easy) suggestions. In fact, the situation is probably worst, if you use org.Mm.eg.db and do:

gene.symbol.eg <- select(org.Mm.eg.db,keys=names(exp.fc),columns="ENTREZID", keytype="SYMBOL")

you will probably find a "1:many mapping", indicating some gene names have multiple IDs. See here and here for discussions and suggestions.

ADD REPLY
0
Entering edit mode
8.4 years ago
bigmawen ▴ 430

id2eg use comprehensive gene annotation packages in Bioconductor. Almost all (if not all) official gene symbols can be mapped to Entrez Gene IDs this way. You should check that the unmapped gene symbols are “official”, as they might be synonyms or even other types of gene IDs, or transcript IDs. Having that said, there are ~30000 genes mapped in your data. Pathway analysis with that should still be very informative.

BTW, in for your error message, species = "mmu" is the solution. When species is not set, the default (hsa, i.e. human) will be used. Hence you get funny pathway names like hsammu04060, of couse, you are not able to download anything for these “pathways”.

ADD COMMENT

Login before adding your answer.

Traffic: 2094 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6