I was looking to get the start and stop positions for all exons in TP53, and of course there are a few ways to do this in R. It is generally believed that TP53 has 11 exons. And yet, each method I use to pull exon info finds a different number. Using biomaRt, I got 53 exons; using GenomicFeatures to get info from UCSC, 21 exons (see below for both examples). In fact, what I get from pulling the data from ensembl doesn't even seem to agree with what is shown on the ensembl page for TP53. In both ensembl and UCSC, incidentally, some of the reported exons overlap with one another.
So, first of all, why is there so much disagreement over how many/which exons exist in TP53? And secondly, how would I get information (mainly start and stop sites) for the "consensus" exons (in the case of TP53, I guess that'd be the e1 through e11 that most biologists believe in)? Thanks.
Exon info from ensembl:
library(biomaRt)
ensembl = useEnsembl(biomart="ensembl", dataset="hsapiens_gene_ensembl")
gb <- getBM(attributes=c('ensembl_exon_id',"exon_chrom_start","exon_chrom_end"),
filters ="hgnc_symbol", values="TP53", mart=ensembl,bmHeader=TRUE)
nrow(gb)
From UCSC:
library("GenomicFeatures")
library("TxDb.Hsapiens.UCSC.hg19.knownGene")
genome <- TxDb.Hsapiens.UCSC.hg19.knownGene
tp53 = genes(genome)[which(genes(genome)$gene_id == 7157),]
tp53_exons = subsetByOverlaps(exons(genome), tp53)
nrow(as.data.frame(tp53_exons))