NCBI e-utilities: timeout and restart
0
0
Entering edit mode
4.5 years ago
noodle ▴ 580

Let's say I have a large NCBI query which I get disconnected from (example below) - is there a way to restart the query from a known # in the list? Or to break apart the esearch query into a few lists to pipe into esummary or efetch? Thanks!

esearch -db 'protein' -query 'CRISPR' | esummary  -db 'protein' -format fasta > output_fasta.txt
NCBI esearch esummary efetch E-utilities • 1.1k views
ADD COMMENT
1
Entering edit mode

Break query into smaller chunks. Also sign up for NCBI API key and use it, if you have not done so already.

ADD REPLY
0
Entering edit mode

Is this your real query or just something that you picked as an example? Something is not right... it returns 127 million hits for this query which I think is pretty much the entire Protein database. If you search for the term CRISPR in the NCBI Protein portal you do not get any results.

$ esearch -db 'protein' -query 'CRISPR'
<ENTREZ_DIRECT>
  <Db>protein</Db>
  <WebEnv>NCID_1_30058417_130.14.22.33_9001_1573479848_1335559330_0MetA0_S_MegaStore</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>127294171</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

EDirect is good for relatively small number of records; a few thousands and depending on the data even a few tens of thousands but not more than that. It will be quicker for you to just download the entire protein dataset from NCBI FTP and filter the specific accessions of interest to you.

ADD REPLY
0
Entering edit mode

Thanks for the reply - this is just a ridiculous example. Downloading and filtering may be the best approach for me.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6