Dear all,
I'm working on GSE23561 dataset and GPL10775 platform.
There are RAW Data Files and a Normalized Series Matrix File in GEO. By mapping ID_REF
s in normalized series matrix to ID
s in platform, I can get gene symbols (Symbol v12) for any probe.
But, I need to construct a Non-Normalized Series Matrix and don't know how to get ID_REF
values (so gene symbols) for probes in raw data. Probes are not ordered by ID_REF
s in raw datasets as it is in the normalized series matrix.
A piece of raw data:
Here, ID_REF
of the 6th
row is not equal to 6
. So, when I directly use row numbers as ID_REF
, the gene symbol appears as MAR6
, but it is actually HPRT1
. I don't know what are these ID
s stand for in raw data or can they be used to get ID_REF
s.
Any suggestion is appreciated. Thanks!
Yes, I'm already using it to get gene symbols for probes of Normalized Series Matrix (by mapping
ID_REF
s toID
s in platform). But, I need to get symbols for probes in Non-normalized dataset (which does not haveID_REF
values).The file above should be for the
platform
( Human 50K Exonic Evidence-Based Oligonucleotide array Technology type spotted oligonucleotide) and should contain everything on the array. It does not?Yes, it does. The problem is that, although raw data files and platform has the same number of rows (50400 for each), the order is not identical. E.g. the 6th row contains MAR6 gene in the raw data tables but HPRT1 in platform. Which means that the gene symbol of a probe with
ID_REF = 6
is HPRT1 since it corresponds toID = 6
in platform. So, I need to findID_REF
values for each row in RAW data to be able to use platform info. Right?