match gives me all NA in annotating my genes
2
0
Entering edit mode
6.1 years ago
zizigolu ★ 4.3k

Hi, I am doing GO in R

I downloaded this annotation

https://www.affymetrix.com/analysis/downloads/na33/wtgene-32_2/HuGene-1_0-st-v1.na33.2.hg19.probeset.csv.zip

    annot = read.csv(file = "HuGene-1_0-st.csv", header = T);
    dim(annot)
    probes = names(datExpr)
> head(probes)
[1] "MKL2"    "MAST2"   "KAT5"    "WWC2"    "UBE2Z"   "PHYHIPL"

    probes2annot = match(probes, annot$transcript_cluster_id)

Gives me all NA

sumis.na(probes2annot))

Should return 0 but returns 7243

What I am doing wrong?

> head(annot)
  probeset_id seqname strand  start   stop probe_count
1     7896739    chr1      +  63033  63649          31
2     7896741    chr1      +  69109  70008          24
3     7896743    chr1      + 334144 334272           6
4     7896745    chr1      + 367693 368597          36
5     7896747    chr1      + 564951 565019          28
6     7896751    chr1      + 568069 568136          28
  transcript_cluster_id  exon_id   psr_id
1               7896738 96595544 97686467
2               7896740 96595546 97686470
3               7896742 96595548 97686473
4               7896744 96595550 97686476
5               7896746 96595552 97686479
6               7896750 96595556 97686485
                                                                                                                                                        gene_assignment
1                                                                                                                                            ENST00000492842 // OR4G11P
2                                                                 BC136848 // OR4F17 /// NM_001005240 // OR4F17 /// NM_001004195 // OR4F4 /// ENST00000318050 // OR4F17
3                                                                                                                                                                   ---
4 NM_001005277 // OR4F16 /// NM_001005221 // OR4F29 /// NM_001005504 // OR4F21 /// ENST00000456475 // OR4F29 /// ENST00000456475 // OR4F16 /// ENST00000456475 // OR4F3
5                                                                                                                                                                   ---
6                                                                                                                                                                   ---
                                                                                                                                                                                    mrna_assignment
1                                                                                                                                                   ENST00000492842 // chr1 // 100 // 31 // 31 // 0
2    BC136848 // chr1 // 100 // 24 // 24 // 0 /// NM_001005240 // chr1 // 100 // 24 // 24 // 0 /// NM_001004195 // chr1 // 100 // 24 // 24 // 0 /// ENST00000318050 // chr1 // 100 // 24 // 24 // 0
3               ENST00000455207 // chr1 // 100 // 6 // 6 // 0 /// TCONS_l2_00002387-XLOC_l2_000726 // chr1 // 100 // 6 // 6 // 0 /// TCONS_l2_00002388-XLOC_l2_000726 // chr1 // 100 // 6 // 6 // 0
4 NM_001005277 // chr1 // 100 // 36 // 36 // 0 /// NM_001005221 // chr1 // 100 // 36 // 36 // 0 /// NM_001005504 // chr1 // 89 // 32 // 36 // 0 /// ENST00000456475 // chr1 // 100 // 36 // 36 // 0
5                                                                                                                                                           AK074482 // chr1 // 79 // 22 // 28 // 0
6                                                                                                                                                         NC_001807 // chr1 // 100 // 24 // 24 // 0
  crosshyb_type number_independent_probes number_cross_hyb_probes
1             3                         0                       0
2             3                         0                       0
3             3                         0                       0
4             3                         0                       0
5             3                         0                       0
6             3                         0                       0
  number_nonoverlapping_probes level bounded noBoundedEvidence
1                            4   ---       0                 0
2                            7   ---       0                 0
3                            0   ---       0                 0
4                            6   ---       0                 0
5                            0   ---       0                 0
6                            0   ---       0                 0
  has_cds fl mrna est vegaGene vegaPseudoGene ensGene sgpGene
1       0  0    0   0        0              0       1       0
2       0  1    0   0        0              0       1       0
3       0  0    0   0        0              0       1       0
4       0  3    0   0        0              0       1       0
5       0  0    0   0        0              0       1       0
6       0  0    0   0        0              0       1       0
  exoniphy twinscan geneid genscan genscanSubopt mouse_fl
1        0        0      0       0             0        0
2        0        0      0       0             0        0
3        0        0      0       0             0        0
4        0        0      0       0             0        0
5        0        0      0       0             0        0
6        0        0      0       0             0        0
  mouse_mrna rat_fl rat_mrna microRNAregistry rnaGene mitomap
1          0      0        0                0       0       0
2          0      0        0                0       0       0
3          0      0        0                0       0       0
4          0      0        0                0       0       0
5          0      0        0                0       0       0
6          0      0        0                0       0       0
  probeset_type
1          main
2          main
3          main
4          main
5          main
6          main
>
R gene annotation • 1.4k views
ADD COMMENT
1
Entering edit mode
6.1 years ago
michael.ante ★ 3.8k

Hi,

in your annot table, the column transcript_cluster_id consists of numerical values. There should not be any match. In this case the match function return the value, given by the parameter 'nomatch'.

I guess, you can try match on the gene assignment. As fara as I remember, there are also a lot Affymetrix specific annotation provided in R (see here).

ADD COMMENT
0
Entering edit mode

Thank you my data is on
GPL16791 Illumina HiSeq 2500 (Homo sapiens)

I also tried gene assignment by your suggestion that gives NA

ADD REPLY
1
Entering edit mode
6.1 years ago
Satyajeet Khare ★ 1.6k

For Affy ST arrays, you can use oligo read.celfiles function like this...

rawData <- read.celfiles(celFiles)

You can try normalization

Data <- rma(rawData)

And finally try annotation on normalized data

Data <- annotateEset(Data, hugene10sttranscriptcluster.db)

You may have to change the annotation database. Not very sure about that.

ADD COMMENT
0
Entering edit mode

Thank you, my data is Illumina HiSeq 2500

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6