rentrez fetch error - unable to obtain query
0
0
Entering edit mode
6.3 years ago
klrdna • 0

Hello everyone,

I'm having a problem with the R package rentrez that I am hoping somebody might be able to help me with.

I am trying to download all mitochondrial COI sequences from Genbank in .gb format, using entrez_search followed by entrez_fetch. The number of records returned by entrez_search is quite large (> 2 million), so I have been using a for loop in rentrez_fetch to download the sequences in smaller chunks. I used this approach some months ago for another gene (mitochondrial 12S rRNA; ~200,000 gb entries) and it worked, but now I keep getting output files that don't contain any sequence entries and instead have the following message (one error message per for loop itineration):

 https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd"> <eFetchResult>   <ERROR>Unable to obtain query #1</ERROR> </eFetchResult>

I've tried this a number of times over the past few days and it keeps happening. Sometimes the download appears to go ok for a number of itinerations of the for loop, but then it starts throwing the same error. Other times I only see the error message and no sequences at all. I have tried doing the snail coi example in the rentrez tutorial and it seemed to work ok. I also repeated the script I used for the 12S gene - the first 20,000 entries downloaded ok, but then the same error message started appearing. I have tried using sys.sleep to create a pause between loops, to lessen the demand on the NCBI server and reducing the chunks to 500 sequences, but I still get the same error. I've also been running the commands after 9pm US eastern time, as per the NCBI guidelines. My R script is:

mitochondrial_COI_seqs_170118<- entrez_search(db="nuccore", term="(((((((((COX1) OR cytochrome c oxidase I) OR COI) OR MT-CO1) OR MTCO1) OR CO I) OR mitochondrially encoded cytochrome c oxidase I) OR main subunit of cytochrome c oxidase) OR cytochrome c oxidase subunit I) AND Mitochondrion[Filter]", use_history=TRUE)
mitochondrial_COI_seqs_170118

for( seq_start in seq(1,2692598,10000)){recs <- entrez_fetch(db="nuccore", web_history=mitochondrial_COI_seqs_170118$web_history,
                       rettype="gb", retmax=10000, retstart=seq_start)
  # if first in sequence create new file otherwise append to file
  if(seq_start==1){
    cat(recs, file="mitochondrial_COI_seqs_170118.gb")
  }else{
    cat(recs, file="mitochondrial_COI_seqs_170118.gb", append=TRUE)
  }
  # print status to console
  cat(seq_start+9999, "sequences downloaded\n")
}

Does anyone have any suggestions? I wondered if I should be breaking the entrez_search step into chunks as well - is that possible? I'm also confused about the difference between using entrez_post and creating a web history within entrez_search with use_history=TRUE.

Thanks in advance :)

rentrez e-utils ncbi R entrez • 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6