How To Map Ucsc Mrna Isoforms To Gene ?
3
3
Entering edit mode
10.2 years ago
jack ▴ 520

I have isoform expression file of human cancer from TCGA. I want to map all this isoform to gene, in other word, I want to group the isoform which belong to one gene together. the ID of isoforms belong to UCSC. Does anybody knows how can I do this ?

isoform_id    normalized_count
uc011lsn.1    0.0000
uc010unu.1    20.1848
uc010uoa.1    7.1561
uc002bgz.2    36.1698
uc002bic.2    0.0000
uc010zzl.1    188.5822
uc001jiu.2    1085.9445
uc010qhg.1    0.0000
uc011krn.1
ngs bioinformatician • 11k views
ADD COMMENT
2
Entering edit mode

In UCSC genome browser itself u can output the ucsc_ids as well as gene names . try to see the tablebrowser and the output options :)

ADD REPLY
1
Entering edit mode

Normally one could use biomart (or the biomaRt Bioconductor package). However, none of these IDs seem to be included in that :(

ADD REPLY
3
Entering edit mode
10.2 years ago
Zhaorong ★ 1.4k

Go to UCSC Table Browser

Select:

group: Genes and Gene Predictions

track: UCSC Genes

table: hgFixed.transMapGeneUcscGenes

identifiers: paste list: (paste the list of isoform ids)

Click "get output"

screenshot

You will have something like:

#id    cds    db    geneName
uc001jiu.2    123..752    hg19    TIMM23
uc002bgz.2        hg19    UBE2QP2
uc002bic.2        hg19    UBE2QP2
uc010qhg.1    123..641    hg19    TIM23
uc010unu.1        hg19    UBE2QP2
uc010uoa.1        hg19    UBE2QP2
uc010zzl.1    1..633    hg19    HMGB1L1
uc011krn.1        hg19    MOXD2
uc011lsn.1        hg19    LOC100130426

You may want to look at the UCSC Table Browser Help or go to USCS genome support forum.

ADD COMMENT
0
Entering edit mode
7.3 years ago
Chun-Jie Liu ▴ 280

Use biomaRt

library(biomaRt)

GENES = useMart("ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")   

getBM(attributes = c('ensembl_gene_id','hgnc_symbol','ensembl_transcript_id','refseq_mrna','ucsc','chromosome_name','transcript_start','transcript_end'), mart = GENES)

This will list all Ensembl gene/transcript id, HGNC symbol, Refseq mRNA and ucsc id mapping. You can add filter and values into getBM to get mapping limited to values length.

The UCSC ID used by FIREHOSE isoform expression data is old version. You can not use the UCSC ID as filter in getBM.

ADD COMMENT

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6