Question

How to pass multiple seq_start and seq_stop statements to NCBI efetch

0

Entering edit mode

7.6 years ago

massa.kassa.sc3na ▴ 590

Hi,
I wasn't able to find anywhere how to pass several seq_start and seq_stop optional arguments to list of queries for NCBI efetch.
See this:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=nuccore&id=433294648rettype=fasta&seq_start=100&seq_stop=200
server ansver: >gb|CP003078.1|:100-200 Mycobacterium sp. JS623, complete genome GGGTCGCAGCCGTATCGCCACGTTCGGGCGACTGTTCGAGGGTACTGACGACATTTCGCTGGGTCAAACC TCGCCCGAGCGATCCCGGGTCACCGCCCGCA
And now multiple queries:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=nuccore&id=433294648,755160968&rettype=fasta
Server ansver: 2 fasta whole records in one file in a blink of an eye.

Question: Does anybody know, if it is possible, and if so, than how to combine those to obtain 1 short fasta record per UID posted, determined by seq_start & seq_stop arguments? So the server answer to something like:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=nuccore&id=433294648,755160968&rettype=fasta&seq_start=100,200&seq_stop=200,500
would be:
>gb|xxxxxxxx.x|:100-200 orgn x GGGTCGCAGCCGTATCGCCACGTTCGGGCGACTGTTCGAGGGTACTGACGACATTTCGCTGGGTCAAACC TCGCCCGAGCGATCCCGGGTCACCGCCCGCA >gb|yyyyyyyy.y|:200-500 orgn y GGGTCGCAGCCGTATCGCCACGTTCGGGCGACTGTTCGAGGGTACTGACGACATTTCGCTGGGTCAAACC TCGCCCGAGCGATCCCGGGTCACCGCCCGCA
What I'have tried so far is comma-separated list of seq_start&stop, putting it into [], add +AND+, add semicolon, anything I could thing of.
I know how to solve this in for-loop but it would help me a lot, if I could do this in 'batch' mode.

Any suggestion would be appreciated. Thanks a lot.

Ps.: I have already asked this here: C: Fetching Genbank Entries For List Of Accession Numbers., but it feels little of topic and question was not elaborated.

sequence NCBI efetch ENTREZ • 2.0k views

ADD COMMENT • link 7.6 years ago by massa.kassa.sc3na ▴ 590

0

Entering edit mode

You can use the Unix e-utils and write a bash script to parse the file to take seq_start and seq_stop values for each line. Sample command would be

efetch -id 433294648 -format fasta -db nucleotide -seq_start 100 -seq_stop 200

PS: NCBI is phasing out GI numbers so it is recommended to use accession numbers instead.

ADD REPLY • link 7.6 years ago by Sej Modha 5.3k

0

Entering edit mode

Hi, Than you for reply. I know that I can do that in a for loop, (and currently doing so); But since I want to fetch relatively short fragments, I want to fetch them all witch one command (reasonable number) to limit the calls to NCBI server. Or am I missing something and this is what the UNIX e-utils would inherently do by itself?

To Ps.: Yes, I know of that, currently it is working with accession, but it is undocumented according to: (http://www.ncbi.nlm.nih.gov/books/NBK25499/)

ADD REPLY • link 7.6 years ago by massa.kassa.sc3na ▴ 590

score 0 · Answer 1 · 2016-09-16

Hi,
so it might be help to anyone in future: after consulting with NCBI Entrez support, it appears that this functionality isn't and will not be supported.
It's a shame. So don't waste your time and for-loop forever.

Good guys wrote:
No that will not be possible. The starts and stops must be singly for the id requested.