Question: BioMart : the BioMart webservice returned an invalid result
0
Entering edit mode
3 months ago
amandinelecerfdefer • 0

Hello,

Thanks to a file containing a list of rsIDs, I want to retrieve the name of the gene and transcripts corresponding to each rsID. tool :

install.packages('BiocManager', repos='http://cran.us.r-project.org')
BiocManager::install(c("biomaRt"))

library(biomaRt)
Data <- read.delim("/Users/amandinelecerfdefer/Desktop/Modification_vcf/cut/rsID_origine.txt2.txt")

snpmart <-
  useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
T1<-Sys.time()
T1
res <- getBM(
  attributes = c(
    "refsnp_id",
    "ensembl_gene_stable_id",
    "ensembl_transcript_stable_id"
  ),
  filters = "snp_filter",
  values = Data$rsID,
  mart = snpmart,
  uniqueRows = TRUE
)

T2<-Sys.time()
T2
write.csv(res, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/name_cut/recovery_gene_trans_original2.txt")
Tdiff= difftime(T2, T1) 
Tdiff
write.csv(Tdiff, file = "/Users/amandinelecerfdefer/Desktop/Modification_vcf/time/time2.txt")`enter code here`

Last week this tool worked very well but for a few days now, it has been impossible to launch it due to a recurring error.

I have this error :

> res <- getBM(
+   attributes = c(
+     "refsnp_id",
+     "ensembl_gene_stable_id",
+     "ensembl_transcript_stable_id"
+   ),
+   filters = "snp_filter",
+   values = Data$rsID,
+   mart = snpmart,
+   uniqueRows = TRUE
+ )
Batch submitting query [=======>-----------------------------------------------------]  13% eta:  2hError in getBM(attributes = c("refsnp_id", "ensembl_gene_stable_id", "ensembl_transcript_stable_id"),  : 
  The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. 
Please report this on the support site at http://support.bioconductor.org

How to fix this error and make the tool work?

thank you

ADD COMMENTlink 3 months ago amandinelecerfdefer • 0
Entering edit mode
0

I just ran your query with a couple of random rsIDs as values and had no problems. Can you give us a sample of your data? How long is your list of values?

ADD REPLYlink 3 months ago
Emily_Ensembl
18k
Entering edit mode
0

Hi, Basically, my file is 17 million lines in size. Having had this error, I thought I would cut this file into sub-files that will have a size of 100,000 lines. Example of a part of a file:

rsID
rs142849724
rs141989890
rs193023236
rs187050627
rs115405973
rs542587725
rs140068063
rs185528550
rs539019715
rs571562101
rs190704807
rs571549807
rs143117458
rs115290438
rs114653362
rs190493256
rs192232546
rs139049437
rs186328231
rs189269980
rs530558338
rs568408968
rs377289156
rs116019130
rs190479833
ADD REPLYlink 3 months ago
amandinelecerfdefer
• 0
Entering edit mode
0

You can't use BioMart with a file 17 million lines long. You could use our APIs or you could parse the data out of the VCF files with consequences.

ADD REPLYlink 3 months ago
Emily_Ensembl
18k
Entering edit mode
0

I suspect I can't do that with a 17 million line file but I tried it with 100,000 a few days ago and it was working but not anymore

ADD REPLYlink 3 months ago
amandinelecerfdefer
• 0
Entering edit mode
0

You can't use it for 100,000 either. We recommend a maximum of 500.

ADD REPLYlink 3 months ago
Emily_Ensembl
18k
Entering edit mode
0

It's strange, but I did it once with 100,000 lines. Thank you for your answer, so I will divide my 17 million line file into 500 line files to find the matches. Thank you. Thank you.

ADD REPLYlink 3 months ago
amandinelecerfdefer
• 0
Entering edit mode
1

Please don't do that either. You will jam up our servers. I recommend parsing the VCFs.

ADD REPLYlink 3 months ago
Emily_Ensembl
18k
Entering edit mode
0

No problem, I will find an other solution.

ADD REPLYlink 3 months ago
amandinelecerfdefer
• 0
Entering edit mode
0

From the previous response of Mike Smith in vector dimension limit in biomaRt, it seems that there's already an internal function to do the batch work?

I've modified the getBM() function in biomaRt to submit queries in batches if the number of values exceeds 500. If you have multiple filters each of which have more than 500 values it should generate multiple mutually exclusive queries so that all combinations are run without breaking the 500 value limit. All of this is done internally, so existing biomaRt scripts shouldn't need to be changed. It will also display a progress bar so you can tell it is still proceeding. This is available from biomaRt version 2.33.1

ADD REPLYlink 3 months ago
SMK
♦ 1.3k

Login before adding your answer.

Powered by the version 1.5