Biostar Beta. Not for public use.
why does my biomaRt query return inconsistent dataset lists?
0
Entering edit mode
13 months ago
adam.faranda • 10

I have been using the biomaRt library to retrieve ensembl gene ID's for mouse genes. This moring, I got an unusual error message when running a previously validated script:

mart <- useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
Error in useDataset(mart = mart, dataset = dataset, verbose = verbose) : 
  The given dataset:  mmusculus_gene_ensembl , is not valid.  Correct dataset names can be obtained with the listDatasets function.

When I used the "listDatasets" function to check whether "mmusculus_gene_ensembl" is correct, I noticed that the query was returning a different number of results each time I ran it. Sometimes, "mmusculus_gene_ensembl" appears in this result set and other times it does not:

 > nrow(listDatasets(mart, verbose=T))
Attempting web service request:
http://www.ensembl.org:80/biomart/martservice?type=datasets&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
[1] 51
> nrow(listDatasets(mart, verbose=T))
Attempting web service request:
http://www.ensembl.org:80/biomart/martservice?type=datasets&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
[1] 116
> nrow(listDatasets(mart, verbose=T))
Attempting web service request:
http://www.ensembl.org:80/biomart/martservice?type=datasets&requestid=biomaRt&mart=ENSEMBL_MART_ENSEMBL
[1] 27

This behavior has been consistent all day. My R session info is below:

R version 3.3.3 (2017-03-06)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X Mavericks 10.9.5

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.30.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0           IRanges_2.8.2        XML_3.98-1.17        digest_0.6.18        bitops_1.0-6         DBI_1.0.0            stats4_3.3.3         RSQLite_2.1.1       
 [9] blob_1.1.1           S4Vectors_0.12.2     tools_3.3.3          bit64_0.9-7          Biobase_2.34.0       RCurl_1.95-4.11      bit_1.1-14           parallel_3.3.3      
[17] BiocGenerics_0.20.0  AnnotationDbi_1.36.2 memoise_1.1.0
ADD COMMENTlink
2
Entering edit mode
13 months ago
Mike Smith ♦ 1.2k
EMBL Heidelberg / de.NBI

There was an issue with biomaRt that manifested when Ensembl release 91 introduced datasets with apostrophes in e.g. "Ma's Night Monkey" which would lead to the error you are seeing. See https://support.bioconductor.org/p/104025/#104043 or https://www.biostars.org/p/289654/#289861 for more details.

You are currently using old versions of both R and biomaRt. I would suggest updating both, in particular you will need biomaRt version 2.34.1 or newer to handle this correctly.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1