Difference Between Biomart Query And The Ensembl Database
2
1
Entering edit mode
12.4 years ago

Hi,

I'm using the biomaRt R packages (bioconductor) to retrieve the 3'UTR sequences of a list of genes. I've the entrezgene Id for each of them and I've differences between the biomaRt result and the ensembl biomart DB.

Here's an example :

For the ENSBTAT00000014489 transcripts (I'm working with Bos Taurus sequences)

In R :

library("biomaRt")
ensembl <- useMart("ensembl")
ensembl <- useDataset("btaurus_gene_ensembl",mart=ensembl)   
getSequence(seqType="3utr",mart=ensembl,type="entrezgene",id=522265)
                                     3utr entrezgene
1 No UTR is annotated for this transcript     522265

In biomart ensembl : In "export Data", only check 3'UTR :

result :

http://www.ensembl.org/Bos_taurus/Export/Output/Transcript?db=core;flank3_display=0;flank5_display=0;g=ENSBTAG00000010909;output=fasta;r=16:73764976-73768065;strand=feature;t=ENSBTAT00000014489;param=utr3;genomic=unmasked;_format=HTML

So, where's the problem ? How can the biomaRt package not retrieve this sequence ?

Thanks a lot,

N.

biomart ensembl utr r • 5.3k views
ADD COMMENT
2
Entering edit mode
12.4 years ago
Neilfws 49k

The link to Ensembl in your question does not display a 3'-UTR. At first glance, it seems to display the full coding sequence for the transcript - note that it begins with ATG and ends with TGA.

When I use web BioMart, I get the exact same result as when using R biomaRt (see screenshot below):

biomart.png

BioMart via the web should always give the same result as via R, since they connect to the same data source. If there are discrepancies, it's generally because the data you have is not what you thought it was.

ADD COMMENT
0
Entering edit mode

ok thanks ! so biomaRt works great :)

ADD REPLY
0
Entering edit mode

FYI, this is the best browser page to check whether a transcript contains a UTR or not:

http://www.ensembl.org/Bos_taurus/Transcript/Exons?db=core;g=ENSBTAG00000010909;r=16:73764976-73768065;t=ENSBTAT00000014489

On this page the CDS is in black, UTRs in purple, introns in blue and flanking sequence in green. So, indeed this transcripts has no UTRs annotated.

ADD REPLY
1
Entering edit mode
12.4 years ago
Andeyatz ▴ 70

Hi,

I think there may be a difference between explicitly annotated UTR regions and a region which is 3' of a transcript. If you look at this page http://www.ensembl.org/Bos_taurus/Transcript/Sequence_cDNA?_format=HTML;db=core;flank3_display=0;flank5_display=0;g=ENSBTAG00000010909;genomic=unmasked;output=fasta;param=utr3;r=16:73764976-73768065;strand=feature;t=ENSBTAT00000014489 then you can see there is no annotated UTR.

The following query on the cow 64 database will show the same result

select t.seq_region_start, t.seq_region_end, e.seq_region_start as exon_start, e.seq_region_end as exon_end, et.rank
from transcript_stable_id 
join transcript t using (transcript_id)
join exon_transcript et using (transcript_id)
join exon e using (exon_id)
where stable_id = 'ENSBTAT00000014489';

Hope this helps

ADD COMMENT

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6