How to combine expression values of multiple probes for one gene?
2
4
Entering edit mode
8.8 years ago
ayanava18 ▴ 40

I am a bit new to R Bioconductor and microarray analysis.

I have loaded a GEO series matrix file (GSE2990) from GEO database in R Bioconductor. This dataset contain expression values of 22283 probes. I wish to get the expression values for the genes for the dataset. Since, there are multiple probes for an individual gene in many cases, I would like to know if there is a package /R code that can combine the expression values of multiple probes for the same gene. Also does oneChannel GUI has this feature? [Please note that I wish to work with a processed GEO dataset ]

Bioconductor oneChannelGUI R • 5.4k views
ADD COMMENT
0
Entering edit mode

What array platform is this? Typically, if it's an Illumina Bead Array then the different probes that represent the same gene, target different parts of the gene.

ADD REPLY
2
Entering edit mode
8.8 years ago
poisonAlien ★ 3.2k

I see that, its an affymetrix chip. Here is snippet which would calculate mean expression of all probesets mapping to same gene.

#Download and install this package.
source("http://bioconductor.org/biocLite.R")
biocLite("hgu133a.db")

#Assuming you have CEL files
aBatch = read.affybatch(filenames = "*.CEL")

#Normalizing with gcrma
gset = gcrma(aBatch)

#fetch entrez id for all probesets.
tab = select(hgu133a.db, keys = keys(hgu133a.db), columns = c("ENTREZID"))

e = exprs(gset)
#merge probes to genes (by Mean expression)
geneExpr = t(sapply(split(tab[,1], tab[,2]), function(ids){
                    colMeans(e[ids,,drop=FALSE])
                }))

P.S: It's not recommended to do this for many reasons.

ADD COMMENT
0
Entering edit mode

Can you explain why it's not recommended to this?

ADD REPLY
1
Entering edit mode

because multiple probesets from a single gene could represent multiple isoforms and by merging them you're loosing this information.

ADD REPLY
1
Entering edit mode
8.8 years ago

Take a look at the findLargest() function in the Bioconductor genefilter package.

ADD COMMENT
0
Entering edit mode

Hi Sean,

Can you please help me with the exact code I need to try?

I loaded the genefilter library and tried like this, but getting those warning messages

> findLargest()
Warning message:
In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
  cannot open compressed file 'C:/Users/Ayanabha/Documents/R/win-library/3.2/survival/DESCRIPTION', probable reason 'No such file or directory'
Error in mget(gN, getAnnMap(map, data)) :
  error in evaluating the argument 'x' in selecting a method for function 'mget': Error: argument "gN" is missing, with no default
> findLargest(gN,testStat,data="hgu133plus2")
Warning message:
In read.dcf(file.path(p, "DESCRIPTION"), c("Package", "Version")) :
  cannot open compressed file 'C:/Users/Ayanabha/Documents/R/win-library/3.2/survival/DESCRIPTION', probable reason 'No such file or directory'
Error in mget(gN, getAnnMap(map, data)) :
  error in evaluating the argument 'x' in selecting a method for function 'mget': Error: object 'gN' not found
ADD REPLY

Login before adding your answer.

Traffic: 2564 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6