Question

R -- Get Exonic Ranges Of A Gene'S Transcripts Given Its Name?

6

Entering edit mode

11.9 years ago

Stew ▴ 60

I am writing an R script in which I have a list of gene names (along with the chromosome and the coordinate of a single point on the gene), and I need to find the list of transcripts for that gene, and the coordinates of the exonic and intronic regions of each of those transcripts (ie exonStarts etc in the UCSC knowngene table).

I'm sure there is a way to do this with Bioconductor, I've been messing around with BSGenome, VariantAnnotation, and the TxDb.Hsapiens.UCSC.hg19.knownGene packages but am making little progress and growing frustrated by the documentation.

Any ideas? Many thanks.

r bioconductor • 8.8k views

ADD COMMENT • link updated 22 months ago by Ram 43k • written 11.9 years ago by Stew ▴ 60

1

Entering edit mode

Have a look at the biomaRt package, that should do.

ADD REPLY • link 11.9 years ago by Michael 54k

Ram · Answer 1 · 2012-05-18

5

Entering edit mode

11.9 years ago

Nathan Harmston ★ 1.1k

How about the GenomicFeatures library?

library(GenomicFeatures)
x= makeTranscriptDbFromUCSC("hg19", tablename="knownGene")
saveFeatures(x, file=paste("hg19", "_ucsc.sqlite", sep=""))
fn = loadFeatures("hg19_ucsc.sqlite")

then you can easily use the exons and transcripts methods to access the information you want if you have the gene identifiers......

genes = c(10772)
names(genes) = "gene_id"
transcripts(fn, vals = genes)

gives you the transcripts for that gene identifier and then you can take those, extract the transcript names and do a similar thing with the exons method.

HTH

ADD COMMENT • link 11.9 years ago by Nathan Harmston ★ 1.1k

0

Entering edit mode

by the way when you say you have a list of gene names I am assuming you meant UCSC gene identifiers and not Entrez gene identifiers / gene names. Otherwise you'll have to convert them first.

ADD REPLY • link 11.9 years ago by Nathan Harmston ★ 1.1k

0

Entering edit mode

For GenomicRanges_1.18.4, saveFeatures and loadFeatures, changed to saveDb and loadDb.

ADD REPLY • link updated 22 months ago by Ram 43k • written 9.1 years ago by zx8754 11k

0

Entering edit mode

You don't need to use makeTranscriptDbFromUCSC. Load the Homo.sapiens package, and it will come with a TxDB object for hg19.

ADD REPLY • link 7.6 years ago by Giovanni M Dall'Olio 28k

score 3 · Answer 2 · 2012-05-18

For this kind of problems Biomart and the Bioconductor biomaRt package are your friends!

For example if you have "entrezgene" ids for your list of genes and you need the corresponding "ensemble transcript id" related to your genes you can do something like this:

library(biomaRt)
mart <- useMart("ensembl")
mart <- useDataset("hsapiens_gene_ensembl", mart)
att <- listAttributes(mart)
f <- listFilters(mart)
attributes <- c("entrezgene","ensembl_transcript_id")
filters <- "entrezgene"
values <- c(9751,64219,5591,642636,6097,83795) #randomly selected
output <- getBM(attributes=attributes, filters=filters, values=values, mart=mart)

Check all available filters and attributes reported in the att and f variables. For a nice introduction to the package see the associated vignette.

score 2 · Answer 3 · 2012-05-24

2

Entering edit mode

11.9 years ago

Paolo ▴ 320

Another solution is to use the geneToTranscript function from the annmap package:

# c & p from the package Vignette
library(annmap)
gene = symbolToGene( c( 'PTEN', 'SHH' ) )
geneToTranscript( gene )

ADD COMMENT • link 11.9 years ago by Paolo ▴ 320

score 1 · Answer 4 · 2016-09-14

Lets get the exons for the PLEC gene in HG19: http://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=full_report&list_uids=5339

library("GenomicFeatures")
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
genome <- TxDb.Hsapiens.UCSC.hg19.knownGene

# the plec gene
plec_gene = genes(genome)[which(genes(genome)$gene_id == 5339),]

# get the exons with the gene coordinates
plec_exons = subsetByOverlaps(exons(genome), plec_gene)

# same for transcripts
plec_t = subsetByOverlaps(transcripts(genome), plec_gene)

That is about it