Biostar Beta. Not for public use.
Is there any differnce between using biomaRt package under R to retrieve gene information from ensembl website and downloading this information using biomart tab in this website?
0
Entering edit mode
23 months ago
M K • 460
United States

Hi everyone,

I am trying to retrieve gene information from ensembl website to compare the the gene information for mouse(mm10) with repetitive DNA is specific genome regions (UTR'S and intron, and upstream). I did two ways to get these files the first one using the R code below, and the second one by going directly to ensembl website using biomart tab to get these files.

I have 2 issues, the first one that there is a difference in total observations(rows) in both ways (i mean the total rows in both files are different).

The second issue, when I start find the genes that sharing the same position with these specific regions for repetitive DNA I got empty file results, and I don't know what causes that. BTW, I downloaded the repetitive DNA files from UCSC website using ensemble genes in track tab.

R code to retrieve the gene info.

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library(biomaRt)

Retrieving mouse (mm10/GRCm38) from Ensembl website

mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")

mm10_Gene=getBM(attributes=c("ensembl_gene_id","chromosome_name",'strand','transcript_start','transcript_end', "mgi_symbol"),mart=mouse)

ADD COMMENTlink
1
Entering edit mode
16 months ago
Ying W ♦ 3.9k
South San Francisco, CA

As long as you are on the same release, the results should be the same (not sure how to tell which release the bioconductor package is using but it might be a couple releases behind the website).

Could you give an example of a gene in repetitive DNA that you can find in website but not through biomaRt?

ADD COMMENTlink
0
Entering edit mode

Hi Ying,

I used mouse(mm10) release, which is the latest release. Then I used table browser in UCSC to download the repetitive DNA and in the track tab I used ensembl genes then I got for example Introns plus region from the get output tab. since UCSC doesn't provide the gene info for ensemble genes specially mgi-symbols I retrieve the gene info from ensembl website directly or by using the r code above.

ADD REPLYlink
0
Entering edit mode

not the mouse reference, but the annotation release, if you look on the ensembl website it is currently on release 80. UCSC is probably using a different release also, annotations are updated more often than reference is.

ADD REPLYlink
0
Entering edit mode

So is there any way to download the repetitive DNA from Ensembl website directly like the one on UCSC? For example I want to download the introinc, CDS, 10K upstream and 10k downstram for the mouse (mm10) and human(hg19). and I think by doing that the annotation data and repetitive DNA will be consist for this analysis since they are from the same source which is ensembl.

ADD REPLYlink
0
Entering edit mode
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1