Question

Converting this matrix to gene name or symbole

0

Entering edit mode

5.1 years ago

zizigolu ★ 4.3k

Hi,

I have this matrix of raw read counts from HTSeq

> head(mat[,1:4])
                TCGA-L5-A4OG-11A-12R-A260-31 TCGA-IC-A6RE-11A-12R-A336-31 TCGA-L5-A4OJ-11A-12R-A260-31
ENSG00000000003                         1818                         4596                         2732
ENSG00000000005                            0                            3                            6
ENSG00000000419                         1436                          751                         1500
ENSG00000000457                         1175                          840                          992
ENSG00000000460                          242                          205                          256
ENSG00000000938                          536                          253                          331
                TCGA-L5-A4OO-11A-12R-A260-31
ENSG00000000003                         1075
ENSG00000000005                            3
ENSG00000000419                         1139
ENSG00000000457                          726
ENSG00000000460                          123
ENSG00000000938                          372
> 

> dim(mat)
[1] 56925    11
>

I want to summarize that by gene name and make matrix smaller to 35000 but I don't know how; @Love says I can not use tximport

Any help please?

Ensembl ID HTSeq RNA-Seq • 2.0k views

ADD COMMENT • link 5.1 years ago by zizigolu ★ 4.3k

1

Entering edit mode

I guess @Love is Mike Love? That means you posted that somewhere before. Please provide links and quotes to what he said. Probably he gave a reason why.

ADD REPLY • link 5.1 years ago by ATpoint 82k

0

Entering edit mode

https://github.com/mikelove/tximport/issues/26

He says

No that’s not what tximport does. We only take input from the methods listed in the help page

ADD REPLY • link 5.1 years ago by zizigolu ★ 4.3k

1

Entering edit mode

A: R org.Hs.eg.db matching ensembl gene ids with gene symbol

ADD REPLY • link 5.1 years ago by GenoMax 141k

0

Entering edit mode

Thanks, now I have

> ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
> values=rownames(mat)
> data <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = values, mart= ensembl)

> head(data[,1:2])
  ensembl_gene_id hgnc_symbol
1 ENSG00000000003      TSPAN6
2 ENSG00000000005        TNMD
3 ENSG00000000419        DPM1
4 ENSG00000000457       SCYL3
5 ENSG00000000460    C1orf112
6 ENSG00000000938         FGR
> 
> dim(data)
[1] 56720     2
> 

> dim(mat)
[1] 56925    11
>

Now I want to extract the read counts of only 56720 matched gene symbol from mat

ADD REPLY • link 5.1 years ago by zizigolu ★ 4.3k

3

Entering edit mode

Then I suggest you use your years of experience in the field to find ways to accomplish that rather than asking for spoon-feeding.

ADD REPLY • link 5.1 years ago by ATpoint 82k

0

Entering edit mode

How sad here there is not any emoji to imitate my face now!

ADD REPLY • link 5.1 years ago by zizigolu ★ 4.3k

0

Entering edit mode

I want to summarize that by gene name

No you don't, but if you did then you'd want to split by gene name and sum across rows. That you can figure out.

ADD REPLY • link 5.1 years ago by Devon Ryan 104k