Error accessing ensembl biomart via BiomaRt R package
1
1
Entering edit mode
6.5 years ago

Objective

I have a list of ~58K ensembl gene ids of h.sapiens for which I need to extract the gene names, descriptions and other annotations from biomart.


The online way is failing primarily because of the huge list I am uploading, hence, I though to give this a try with the BiomaRt R package. I am trying to access ensembl biomart using following commands

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library(biomaRt)
listEnsembl()

Error encountered

> listEnsembl()
Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.  Check http://www.biomart.org and verify if this website is available.
Error: XML content does not seem to be XML:

From the error, it appears to be a problem with the URL - http://www.biomart.org, which I could infact access without any issue.

What I can see is a downtime notice here. Is it something related to this?

Can anybody suggest anything else?

biomart R • 4.2k views
ADD COMMENT
1
Entering edit mode

What version of R and biomaRt are you using? I suspect you might have an old version. While the www.biomart.org website still exists, it ceased to be the central reportistory for BioMart instances quite a while ago. All the defaults in the biomaRt package should now point to www.ensembl.org

You can check the version using the command sessionInfo(), here's mine along with the output i get when running listEnsembl()

> sessionInfo()   
R version 3.4.1 (2017-06-30)   
Platform: x86_64-pc-linux-gnu (64-bit)   
Running under: Linux Mint 18.1
Matrix products: default   BLAS:
/home/msmith/Applications/R/R-3.4.1/lib/libRblas.so LAPACK:
/home/msmith/Applications/R/R-3.4.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C        
    [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.33.5

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12         AnnotationDbi_1.39.3 magrittr_1.5        
 [4] BiocGenerics_0.23.1  progress_1.1.2       IRanges_2.11.16     
 [7] bit_1.1-12           R6_2.2.2             rlang_0.1.2         
[10] stringr_1.2.0        blob_1.1.0           tools_3.4.1         
[13] parallel_3.4.1       Biobase_2.37.2       DBI_0.7             
[16] bit64_0.9-7          digest_0.6.12        assertthat_0.2.0    
[19] tibble_1.3.4         S4Vectors_0.15.8     bitops_1.0-6        
[22] RCurl_1.95-4.8       memoise_1.1.0        RSQLite_2.0         
[25] stringi_1.1.5        compiler_3.4.1       prettyunits_1.0.2   
[28] stats4_3.4.1         XML_3.98-1.9        

> listEnsembl()
             biomart               version
1            ensembl      Ensembl Genes 90
2 ENSEMBL_MART_MOUSE      Mouse strains 90
3                snp  Ensembl Variation 90
4         regulation Ensembl Regulation 90
ADD REPLY
0
Entering edit mode
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.30.0       BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9          IRanges_2.8.1        XML_3.98-1.5        
 [4] digest_0.6.12        bitops_1.0-6         DBI_0.5-1           
 [7] stats4_3.3.1         RSQLite_1.1-2        S4Vectors_0.12.1    
[10] tools_3.3.1          Biobase_2.34.0       RCurl_1.95-4.8      
[13] parallel_3.3.1       BiocGenerics_0.20.0  AnnotationDbi_1.36.1
[16] memoise_1.0.0
ADD REPLY
0
Entering edit mode

Working fine for me:

> library(biomaRt)
> listEnsembl()
             biomart               version
1            ensembl      Ensembl Genes 90
2 ENSEMBL_MART_MOUSE      Mouse strains 90
3                snp  Ensembl Variation 90
4         regulation Ensembl Regulation 90
>
ADD REPLY
0
Entering edit mode

Check to see if an overzealous intrusion prevention device (or a firewall admin) has disabled your access since it seems to be working for others.

ADD REPLY
3
Entering edit mode
6.5 years ago
Mike Smith ★ 2.0k

It looks like you're using a fairly old version of both R and biomaRt at the moment. I've made quite a few changes to the package over the past year, particularly regarding connectivity and error messages, so I'd suggest upgrading. You can keep the same version of R and install the latest biomaRt using the following command:

BiocInstaller::biocLite('grimbough/biomaRt')

I would then try re-running listEnsembl() with the verbose flag. This will print the actual URL it is trying to access, which you can then try in a web browser. It should be an XML file starting with <MartRegistry>.

listEnsembl(verbose = TRUE)

You can also try accessing one of the mirror sites, then report back here with any output, e.g.

listEnsembl(verbose = TRUE, mirror = "asia")
ADD COMMENT

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6