Question

What happened to my mapping IDs in the matrix counts?

0

Entering edit mode

6.2 years ago

Pin.Bioinf ▴ 340

Hello, I built a summarized experiment with summarized overlaps by using the UCSC gtf annotation file and I got the object se.

  library(TxDb.Hsapiens.UCSC.hg38.knownGene)
  ebg <- exonsBy(TxDb.Hsapiens.UCSC.hg38.knownGene, by="gene")



    se <- summarizeOverlaps(features=ebg, reads=bamfiles,
                            mode="Union",
                            singleEnd=TRUE,
                            ignore.strand=FALSE,
                            fragments=FALSE )

But when I print it, I get this:

 > assay(se)

                 [,1]  [,2]   [,3]   [,4]  [,5]  [,6]  [,7]   [,8]   [,9]  [,10]  [,11]  [,12]  [,13]
1             425   293   1273   1531   878   142   153    597    266   3929   1499   1655    751
10            73    50    127    118   115    82    65    194     73    311    153    671    561
100            5     1      5     15    10    16    17     41     42     27     14      4     12
1000         134    95    105    139    95   176   110    243    140    219     96    130     81
10000         26    23      1      4     2    12     9     32     25     16     12     17     12
100008587      0     0      0      0     0     0     0      0      0      0      0      0      0
100008589      0     0      0      0     0     0     0      0      0      0      0      0      0
100009613      0     0      0      0     0     0     0      0      0      0      0      0      0
100009676      8    10     11     14     8    26    30     89     55     22     20     23      4
10001         49    41     31     52    24    79    73    154    136     74    104    171    175
10002          1     0      0      0     0     0     0      1      0      0      1      0      0
10003          0     0      1      0     0     0     0      0      0      0      0      0      0
100033413      0     0      0      0     0     0     0      0      0      0      0      0      0
100033414      3     4      3      3     0     2     2      2      1      0      0      0      3
100033415      0     0      0      0     0     0     0      0      0      0      0      0      0
100033416      0     0      0      0     0     0     0      0      0      0      0      0      0
100033417      0     0      0      0     0     0     0      0      0      0      0      0      0
100033420      1     0      1      0     1     0     0      0      0      0      0      0      0

Did I do something wrong? What are those IDs? (100033413,10002,10001,... ) What happened to my UCSC IDs? or what database do they belong to? How could I annotate these genes when I finish the DE analysis?

Thank you very much.

RNA-Seq R mapping quantification • 1.2k views

ADD COMMENT • link 6.2 years ago by Pin.Bioinf ▴ 340