Get reference full genome sequences for selected organisms
1
1
Entering edit mode
6.3 years ago

Hello,

I would like to download all the reference sequences, full length for a given organism. I am using esearch as reported on the NCBIwebsitee with the following command:

esearch -db "nucleotide" -query "txidX[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

where X is the code for a given taxon. This works but I get both 'complete genome' and 'complete sequence' entries.

Is it possible to get only the 'complete genome' entries? Thank you

genome blast • 1.8k views
ADD COMMENT
2
Entering edit mode
6.3 years ago
Joseph Hughes ★ 3.0k

This is most likely a result of your particular species having multiple segments or chromosomes. For example:

esearch -db "nucleotide" -query "txid40120[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 32 complete genomes but

esearch -db "nucleotide" -query "txid4txid40051[Organism] AND refseq[filter]"|efetch -format fasta > genome.fa

would retrieve 10 complete sequences, one for each of the 10 segments of the bluetongue virus.

So the approach to take depends on what you really want to retrieve.

ADD COMMENT
0
Entering edit mode

thank you, but the taxon I am looking for contains both complete genomes and sequences; still is there a way to separate them, either directly with an option of esearch or afterward with the manipulation of the resulting fasta file?

ADD REPLY

Login before adding your answer.

Traffic: 2632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6