Maximum number of efetch
1
0
Entering edit mode
6.5 years ago
horsedog ▴ 60

Hi,

I'm retrieving protein sequences from UID using efetch. I got more than 30000 UID but after running python script, it only gave me exact 10000 sequences, I'm wondering is there any limitation of number of retrieving, like 10000? Is it possible to get more 30000 at one time? Thanks

Edirect • 3.8k views
ADD COMMENT
0
Entering edit mode

In addition to @Pierre's note below consider this. Time mentioned below is US East coast time.

In order not to overload the E-utility servers, NCBI recommends that users post no more than three URL requests per second and limit large jobs to either weekends or between 9:00 PM and 5:00 AM Eastern time during weekdays. Failure to comply with this policy may result in an IP address being blocked from accessing NCBI. If NCBI blocks an IP address, service will not be restored unless the developers of the software accessing the E-utilities register values of the tool and email parameters with NCBI.

ADD REPLY
0
Entering edit mode

You can probably retrieve sequences much more efficiently using blastdbcmd from blast+ suite and a local copy of nr blast database.

ADD REPLY
1
Entering edit mode
6.5 years ago

https://www.ncbi.nlm.nih.gov/books/NBK25499/

Total number of UIDs from the retrieved set to be shown in the XML output (default=20).

Increasing retmax allows more of the retrieved UIDs to be included in the XML output, up to a maximum of 100,000 records.

To retrieve more than 100,000 UIDs, submit multiple esearch requests while incrementing the value of retstart

ADD COMMENT
0
Entering edit mode

Hi Pierre, is it possible to just get the fasta file instead of XML file? And I have no idea where to add retmax or retstart, this is my code: from Bio import Entrez Entrez.email = "A.N.Other@example.com" blast = open("file.txt").read() handle = Entrez.efetch(db="protein", id= blast, rettype="fasta") print(handle.read())

ADD REPLY
0
Entering edit mode

Hi Pierre, do you know if looping through the records returned maintains an open ftp connection to NCBI? I have a firewall that doesn't allow ftps connections to remain open for long and my loop fails somewhere between 3 and 10 iterations. I suspect this is due to the ftp connection. I don't believe these iterations could hit the 3 requests per second maximum.

ADD REPLY

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6