How to convert Illumina probe expression data into gene expression data?
1
2
Entering edit mode
9.7 years ago
Avro ▴ 160

Hi everyone,

My lab is using gene expression data generated by Illumina Human HT-12 v3 Expression Beadchips. As advertised by the company, this products has 48000+ probes for 25000 genes. I have never used expression data before and would like to cluster genes based on their expression. The data has already been normalized and corrected for batch effects.

The current file format is:

ProbeID      Sample1      Sample2   

I would like to get the following format:

GeneID       Sample1      Sample2

It seems that some genes have more probes than others. Moreover, there can be multiple transcripts for a given gene. I was wondering if someone could please give me a general idea about getting the desired format.

Thank you for your time.

HT-12 Illumina • 19k views
ADD COMMENT
5
Entering edit mode
9.7 years ago
poisonAlien ★ 3.2k

Hi,

Its easier to do this in R. All you need is to convert ProbeID into the Gene name to which it is mapped.

> probeID=c("ILMN_1690170", "ILMN_2410826", "ILMN_1675640", "ILMN_1801246",
          "ILMN_1658247", "ILMN_1740938", "ILMN_1657871", "ILMN_1769520",
          "ILMN_1778401")
> library("illuminaHumanv4.db") #Get this library if you don't have

> data.frame(Gene=unlist(mget(x = probeID,envir = illuminaHumanv4SYMBOL)))
               Gene
ILMN_1690170 CRABP2
ILMN_2410826   OAS1
ILMN_1675640   OAS1
ILMN_1801246 IFITM1
ILMN_1658247   OAS1
ILMN_1740938   APOE
ILMN_1657871  RSAD2
ILMN_1769520 UBE2L6
ILMN_1778401  HLA-B
ADD COMMENT
0
Entering edit mode

Thank you very much for your time! I am trying it right now. I was wondering: when someone wants to cluster genes, don't they need one expression value for each gene? If so, how can you incorporate the expression of several probes within a gene into one value?

Thank you!

ADD REPLY
1
Entering edit mode

Not that I am sure of this, but I would not try to summarize different probesets of a gene into a single value, since as you have mentioned, they could be from different transcripts of the same gene. Its better to continue with the normalized expression values of probes for clustering.

ADD REPLY

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6