Biostar Beta. Not for public use.
Converting Affymetrix Probes To Gene Ids
8
Entering edit mode
6.2 years ago
Josh • 80

I downloaded the CGP cell line project expression data and would like to convert the affy probes to official gene symbols. It's the HG U133A v2 platform and the dataset has a total of around 22000 probes. What's the best way to do this? I tried using IDconverter, but it froze after around 100 genes. When I used DAVID to convert to official gene symbol, the results only had about 9800 genes. Using DAVID to convert to entrez returned about 24000 ids, as for some probes, multiple entrez gene ids were returned. How should I deal with these duplicated entrez ids, or is there a better way to do the conversion altogether? Thanks!

ADD COMMENTlink
0
Entering edit mode
  1. You state first that you want official gene symbols (presumably HUGO?), but then talk about Entrez IDs. 2. The brief answer to your question is "BioMart". Please search this site for that term, there are many answers to questions virtually identical to this one.
ADD REPLYlink
1
Entering edit mode

analogous mapping questions been asked continuously (here and elswhere) over at least a decade because no one (?) ever made a decent 3' UTR probe set that would have had much cleaner gene mappings including paralogue resolution

ADD REPLYlink
2
Entering edit mode

4 years on so this will never happen?

ADD REPLYlink
0
Entering edit mode

I just tried to figure it out today, The code provided by Diwan, is for Rats, it depends which type of Samples you used, Human/Rat/Mouse etc and also it depends on R and Bio conductor versions. I am using R 3.3.2 and Bioconductor 3.4. The following codes works for me, but I am not able to see all Probe IDs ( Keytype = "PROBEID") got results for only few genes.

However, Affymetrix id information is present in Thermofisher database. https://www.thermofisher.com/us/en/home/life-science/microarray-analysis/microarray-data-analysis/genechip-array-annotation-files.html

## Converting PROBEIDs to Gene name and symbols
## Depends of Organism (Human /Rat/Mice) and depends on R Version and Bioconductor version
source("http://bioconductor.org/biocLite.R")
biocLite("hgu95av2.db")
library("AnnotationDbi")
library("hgu95av2.db")    ##for Human
select(hgu95av2.db, c("1007_s_at","1053_at"), c("SYMBOL","ENTREZID", "GENENAME")) ##  This is just a trying example
PROBES<- as.character(GSE22483$ID_REF)
OUT <- select(hgu95av2.db,keys= PROBES, columns=c("SYMBOL", "ENTREZID", "GENENAME"),keytype="UNIGENE")
keytypes(hgu95av2.db)
ADD REPLYlink
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink
9
Entering edit mode
23 months ago
Diwan • 560
United States

In R, for example if I want to convert affy ids"1368587_at" and "1385248_a_at" (rat2302 chip) to their gene ids, I will use the following below:

library("annotate")
library("rat2302.db")    # here use your chip hgu133a.db

select(rat2302.db, c("1368587_at","1385248_a_at"), c("SYMBOL","ENTREZID", "GENENAME"))

For all probes, create a vector of probes and then use select:

PROBES<- as.character(FCMATRIX$probe)

OUT <- select(rat2302.db, PROBES, c("SYMBOL", "ENTREZID", "GENENAME"))

Install your chip .db package from bioc

source("http://bioconductor.org/biocLite.R")

biocLite("hgu133a.db")

HTH

Diwan

ADD COMMENTlink
0
Entering edit mode

For anyone swaying between this and biomaRt - I've worked with biomaRt in the past and though very useful and programmatically accessible, practically the database goes down often and you frequently find yourself waiting around between queries. Downloading a database to select against like this is preferable.

ADD REPLYlink
0
Entering edit mode

hi Diwan

After I install annotate package and... I run your script but I gave an error

Error in select(rat2302.db, c("1368587_at", "1385248_a_at"), c("SYMBOL", :
unused argument (c("SYMBOL", "ENTREZID", "GENENAME"))

I'm new in using R , please explain for me, what's the problem

ADD REPLYlink
3
Entering edit mode
14 months ago
National Institutes of Health, Bethesda…

If you are an R user, consider:

http://www.bioconductor.org/packages/release/data/annotation/html/hgu133a2.db.html

Details on the use can be seen in the AnnotationDbi vignettes.

http://www.bioconductor.org/packages/release/bioc/html/AnnotationDbi.html

Alternatively, consider the biomaRt package and see the biomaRt user guide:

http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html

ADD COMMENTlink
1
Entering edit mode
24 months ago
macmath • 130
France

Another easy way to annotate Affymetrix Probes to Gene IDS using thislink

Upload your Probe list and it will give you all the needful information

Additionally it also helps in cross platform orthologs among probes

ADD COMMENTlink
1
Entering edit mode
3.5 years ago
jananir1803 • 20
eset <- ExpressionSet(assayData=dat)

ID     <- featureNames(eset)

out <- mapIds(hgu133a.db, keys=as.character(ID), c("SYMBOL"), keytype="PROBEID")
ADD COMMENTlink
1
Entering edit mode
ADD COMMENTlink
1
Entering edit mode
16 months ago
The University of Edinburgh

You can use BioMart:

library("biomaRt")
ensembl = useMart(biomart= "ensembl",dataset="hsapiens_gene_ensembl")
affy_ensembl= c("affy_hg_u133_plus_2", "ensembl_gene_id")
getBM(attributes= affy_ensembl, mart= ensembl, values = "*", uniqueRows=T)

The problem in conversion from probe ID to entrez or ensembl gene ID is, one probe ID can represent more than one ensembl gene id and visa versa.

The solution is:

    1. get rid of a probe ID represent more than one ensembl gene ID
    1. Take the mean or max of multiple prob IDs represent one ensembl or entrez ID

Other solution is you can use Brainarray's costum cdfs. (i prefer this one)

download.file("http://mbni.org/customcdf/21.0.0/ensg.download/hgu133plus2hsensgcdf_21.0.0.tar.gz", "/home/hgu133plus2hsensgcdf")
install.packages("/home/hgu133plus2hsensgcdf",repos = NULL)
library(hgu133plus2hsensgcdf)

library(affy)
RawData=ReadAffy(verbose=TRUE, celfile.path=celfilepath, cdfname= "hgu133plus2hsensgcdf", filenames=celfilenames)
ADD COMMENTlink
0
Entering edit mode

How would you do this if you had already gotten the normalized gene expression?

ADD REPLYlink
0
Entering edit mode
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1