Question

Oligo Exon array transcript cluster ID to gene symbols

0

Entering edit mode

8.3 years ago

lingjianyang • 0

Dear all,

I am using Oligo package to preprocess CEL files from Human Exon 1.0 st array. I have summarised expression data to the level of transcript cluster (rma(celfiles,target="core")) and end up with a total number of 22011 transcript clusters. I would love to perform gene level analysis instead of trascript level, therefore I need to map the transcript clusters to the genes.

From the following command: featureData(exonCore) <- getNetAffx(exonCore, "transcript") I have obtained the corresponding annotation file. However, when I looked into the annotation information from pData(featureData(exonCore))[,c("probesetid","geneassignment")], it looks like a few thousand transcript clusters do not have gene assignments at all. That may be a smaller of an issue but more importantly, a lot of transcript clusters are mapped to many gene symbols. The geneassignment column has many entries.

When I take away the transcript clusters that are mapped to multiple gene symbols, I end up with around 12,000 or 14,000 transcript clusters that can uniquely map to genes. This number looks too few for me, as for example TCGA exon expression data contains about 18,000 genes.

Do I use the annotation file correctly or I have already misdone something here? Is that a generally better strategy to summarise to the level of probe set and then represent the genes with their constituent probe sets somehow?

Thanks

gene exon microarray gene-annotation oligo • 4.2k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.3 years ago by lingjianyang • 0

Ram · Answer 1 · 2016-01-22

1

Entering edit mode

8.3 years ago

bulldogshepherd ▴ 40

I used to do analysis of Human Exon array years ago (not with Oligo), and it was a common practice to summarize probe set level values into gene level value. As far as I remember, the number of "core" unique genes should be around 18000 as TCGA data shows. Many genes have multiple gene symbols and it's not recommended to use gene symbol to define a unique gene. Try using gene id instead, which is also written in gene assignment column.

ADD COMMENT • link 8.3 years ago by bulldogshepherd ▴ 40

0

Entering edit mode

Thanks for your insight. Do you recommend summarising from probe level to probe set level (target="probeset") before further summarising probe set level values to gene level values (for example by taking the mean/median across all probe sets in a gene); or summarising from probe level to transcript level (target="core"), before mapping transcripts to the genes?

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by lingjianyang • 0

0

Entering edit mode

I would recommend to summarize probe set level to gene level, as seen in a comment of a previous post.

Computing Expression From Affymetrix Exon Array Data

ADD REPLY • link 8.3 years ago by bulldogshepherd ▴ 40

0

Entering edit mode

Thanks, appreciate this.

ADD REPLY • link 8.3 years ago by lingjianyang • 0