Question

BioMart : the BioMart webservice returned an invalid result

0

Entering edit mode

4.9 years ago

amandinelecerfdefer ▴ 20

Hello,

Thanks to a file containing a list of rsIDs, I want to retrieve the name of the gene and transcripts corresponding to each rsID. tool :

install.packages('BiocManager', repos='http://cran.us.r-project.org')
BiocManager::install(c("biomaRt"))

library(biomaRt)
Data <- read.delim("/Users/amandinelecerfdefer/Desktop/Modification_vcf/cut/rsID_origine.txt2.txt")

snpmart <-
  useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
T1<-Sys.time()
T1
res <- getBM(
  attributes = c(
    "refsnp_id",
    "ensembl_gene_stable_id",
    "ensembl_transcript_stable_id"
  ),
  filters = "snp_filter",
  values = Data$rsID,
  mart = snpmart,
  uniqueRows = TRUE
)

T2<-Sys.time()
T2
write.csv(res, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/name_cut/recovery_gene_trans_original2.txt")
Tdiff= difftime(T2, T1) 
Tdiff
write.csv(Tdiff, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/time/time2.txt")`enter code here`

Last week this tool worked very well but for a few days now, it has been impossible to launch it due to a recurring error.

I have this error :

> res <- getBM(
+   attributes = c(
+     "refsnp_id",
+     "ensembl_gene_stable_id",
+     "ensembl_transcript_stable_id"
+   ),
+   filters = "snp_filter",
+   values = Data$rsID,
+   mart = snpmart,
+   uniqueRows = TRUE
+ )
Batch submitting query [=======>-----------------------------------------------------]  13% eta:  2hError in getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id"),  : 
  The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. 
Please report this on the support site at http://support.bioconductor.org

How to fix this error and make the tool work?

thank you

snp Biomart • 3.1k views

ADD COMMENT • link updated 4.8 years ago by Biostar 20 • written 4.9 years ago by amandinelecerfdefer ▴ 20

score 1 · Answer 1 · 2019-06-12

1

Entering edit mode

4.9 years ago

Emily 23k

I just ran your query with a couple of random rsIDs as values and had no problems. Can you give us a sample of your data? How long is your list of values?

ADD COMMENT • link 4.9 years ago by Emily 23k

0

Entering edit mode

Hi, Basically, my file is 17 million lines in size. Having had this error, I thought I would cut this file into sub-files that will have a size of 100,000 lines. Example of a part of a file:

rsID
rs142849724
rs141989890
rs193023236
rs187050627
rs115405973
rs542587725
rs140068063
rs185528550
rs539019715
rs571562101
rs190704807
rs571549807
rs143117458
rs115290438
rs114653362
rs190493256
rs192232546
rs139049437
rs186328231
rs189269980
rs530558338
rs568408968
rs377289156
rs116019130
rs190479833

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20

1

Entering edit mode

You can't use BioMart with a file 17 million lines long. You could use our APIs or you could parse the data out of the VCF files with consequences.

ADD REPLY • link 4.9 years ago by Emily 23k

0

Entering edit mode

I suspect I can't do that with a 17 million line file but I tried it with 100,000 a few days ago and it was working but not anymore

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20

0

Entering edit mode

You can't use it for 100,000 either. We recommend a maximum of 500.

ADD REPLY • link 4.9 years ago by Emily 23k

0

Entering edit mode

It's strange, but I did it once with 100,000 lines. Thank you for your answer, so I will divide my 17 million line file into 500 line files to find the matches. Thank you. Thank you.

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20

2

Entering edit mode

Please don't do that either. You will jam up our servers. I recommend parsing the VCFs.

ADD REPLY • link 4.9 years ago by Emily 23k

0

Entering edit mode

No problem, I will find an other solution.

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20

0

Entering edit mode

From the previous response of Mike Smith in vector dimension limit in biomaRt, it seems that there's already an internal function to do the batch work?

I've modified the getBM() function in biomaRt to submit queries in batches if the number of values exceeds 500. If you have multiple filters each of which have more than 500 values it should generate multiple mutually exclusive queries so that all combinations are run without breaking the 500 value limit. All of this is done internally, so existing biomaRt scripts shouldn't need to be changed. It will also display a progress bar so you can tell it is still proceeding. This is available from biomaRt version 2.33.1

ADD REPLY • link 4.9 years ago by AK ★ 2.2k

0

Entering edit mode

I modified my request with the information given in the post. But a new memory error appears:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Connection time-out
Calls: useMart ... request_fetch -> request_fetch.write_memory -> <Anonymous>
Exécution arrêtée

Biomart version : 2.40.0

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20

0

Entering edit mode

Seriously, please don't.

ADD REPLY • link 4.9 years ago by Emily 23k

0

Entering edit mode

I only made a request for 20,000 rsID because, as Mike says, he expanded the research capacity. I have only requested a single file of 20,000 lines without making any loops, I test Mike's update.

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20

3

Entering edit mode

I told you another way.
Parse the VCF
Use the APIs
Use the VEP

Don't blame me when your IP address gets blocked for clogging up our servers.

ADD REPLY • link 4.9 years ago by Emily 23k

0

Entering edit mode

Thank you for your suggestions, I will explore these tools to find a more suitable one and thus avoid overloading the server.

ADD REPLY • link 4.9 years ago by amandinelecerfdefer ▴ 20