Converting this matrix to gene name or symbole
0
0
Entering edit mode
5.1 years ago
zizigolu ★ 4.3k

Hi,

I have this matrix of raw read counts from HTSeq

> head(mat[,1:4])
                TCGA-L5-A4OG-11A-12R-A260-31 TCGA-IC-A6RE-11A-12R-A336-31 TCGA-L5-A4OJ-11A-12R-A260-31
ENSG00000000003                         1818                         4596                         2732
ENSG00000000005                            0                            3                            6
ENSG00000000419                         1436                          751                         1500
ENSG00000000457                         1175                          840                          992
ENSG00000000460                          242                          205                          256
ENSG00000000938                          536                          253                          331
                TCGA-L5-A4OO-11A-12R-A260-31
ENSG00000000003                         1075
ENSG00000000005                            3
ENSG00000000419                         1139
ENSG00000000457                          726
ENSG00000000460                          123
ENSG00000000938                          372
> 

> dim(mat)
[1] 56925    11
>

I want to summarize that by gene name and make matrix smaller to 35000 but I don't know how; @Love says I can not use tximport

Any help please?

Ensembl ID HTSeq RNA-Seq • 2.0k views
ADD COMMENT
1
Entering edit mode

I guess @Love is Mike Love? That means you posted that somewhere before. Please provide links and quotes to what he said. Probably he gave a reason why.

ADD REPLY
0
Entering edit mode

https://github.com/mikelove/tximport/issues/26

He says

No that’s not what tximport does. We only take input from the methods listed in the help page

ADD REPLY
0
Entering edit mode

Thanks, now I have

> ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
> values=rownames(mat)
> data <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = values, mart= ensembl)

> head(data[,1:2])
  ensembl_gene_id hgnc_symbol
1 ENSG00000000003      TSPAN6
2 ENSG00000000005        TNMD
3 ENSG00000000419        DPM1
4 ENSG00000000457       SCYL3
5 ENSG00000000460    C1orf112
6 ENSG00000000938         FGR
> 
> dim(data)
[1] 56720     2
> 

> dim(mat)
[1] 56925    11
>

Now I want to extract the read counts of only 56720 matched gene symbol from mat

ADD REPLY
3
Entering edit mode

Then I suggest you use your years of experience in the field to find ways to accomplish that rather than asking for spoon-feeding.

ADD REPLY
0
Entering edit mode

How sad here there is not any emoji to imitate my face now!

ADD REPLY
0
Entering edit mode

I want to summarize that by gene name

No you don't, but if you did then you'd want to split by gene name and sum across rows. That you can figure out.

ADD REPLY

Login before adding your answer.

Traffic: 2417 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6