Biostar Beta. Not for public use.
How to retrieve any and all NCBI/GenBank accession numbers from a Taxonomy ID?
0
Entering edit mode
22 months ago
yarmda • 0

I want to supply a taxID for any level of phylogeny and retrieve all of the accession numbers for organisms that fit. For example, a taxID of 1063 is species-level Rhodobacter sphaeroides and has around 7 strains. Is it possible to use efetch to retrieve the accession numbers for all of their genomes?

Retrieving the taxID from an accession number is straightforward with: curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=acc_number&rettype=fasta&retmode=xml"

Granted, there's some grepping after the data comes back, but that's fine. I'm looking for something similar that will give back every accession number associated with the clade's tax ID.

Ideally, I would be able to include a taxID query into the eutils/efetch I have above. Is it possible to query by one of the fields returned by the above?

Since the above curl brings back data that includes taxID, could I query the nuccore database by the taxID instead of the accession number?

Does that make sense?

ADD COMMENTlink
0
Entering edit mode

I did not find an automated solution to this, yet. I have resolved to download accession numbers from the NCBI site manually. Since I'm only after a handful of unchanging targets, this will suit my needs for now.

ADD REPLYlink
0
Entering edit mode
4 months ago
genomax 68k
United States

See my answer in this post: Automatically Accessing all the sequences of a given order?
Since you want accession numbers add step 4a: Under "Summary" on left side of the page choose "Format" --> "Accession list".

ADD COMMENTlink
0
Entering edit mode

Thanks for this! While this is a solution, I'm trying to keep everything automated in a single script - so I don't think this is quite the solution I want.

ADD REPLYlink
0
Entering edit mode
2.5 years ago
Prasad ♦ 1.6k
India

have you tried elink?

here is the example output for taxid you have mentioned

ADD COMMENTlink
0
Entering edit mode

What do the IDs in the output represent?

ADD REPLYlink
0
Entering edit mode

gi ids for all the entries for that particular taxaid in NCBI nucleotide database. You can change the database name accordingly, see here.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3