Question

How to download Refseq of bacteria as a COMPLETE genome format

0

Entering edit mode

5.6 years ago

Shelle ▴ 30

Can anyone tell me how i can download the complete genome format of bacteria Refseq from NCBI ? I mention complete genome as i don't want any word of chromosome be in the FASTA files. I just saw some post regarding this matter and it seems with the name of organism in the format of text file will do the job. I can get the CSV file of the organism from this link " https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/ " and extract only the first column. If I use the script from this thread How to download COMPLETE bacterial genomes from NCBI based on list of names?, it seems it is not working and nothing will be downloaded. Can anyone tell me if this code is compatible to every format of species.txt or should i reformat it somehow?

cat species.txt
"'Brassica napus' phytoplasma"
"'Candidatus Kapabacteria' thiocyanatum"
"'Chrysanthemum coronarium' phytoplasma"
"'Echinacea purpurea' witches'-broom phytoplasma"
"'Osedax' symbiont bacterium Rs2_46_30_T18"
"'Sphingomonas ginsengisoli' Hoang et al. 2012"
"Abaca bunchy top virus"
"Abalone herpesvirus Victoria/AUS/2009"
"Abalone shriveling syndrome-associated virus"
"Abditibacterium utsteinense"
"Abelson murine leukemia virus"
"Abeoforma whisleri"
"Abiotrophia"
"Abiotrophia defectiva"
"Abisko virus"
"Absidia glauca"
"Absidia repens"
"Absiella dolichum"
"Abutilon Brazil virus"
"Abutilon golden mosaic virus"
"Abutilon mosaic Bolivia virus"
"Abutilon mosaic Brazil virus"
"Abutilon mosaic virus"



wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt

IFS=$'\n'; for next in $(cat species.txt); do awk -v SPECIES=^"$next" 'BEGIN{FS="\t"}{if($8 ~ SPECIES && $12=="Complete Genome"){print $20}}' assembly_summary.txt \
    | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}'; done \
    | sh

genome sequencing FASTA Refseq • 1.5k views

ADD COMMENT • link updated 5.6 years ago by GenoMax 141k • written 5.6 years ago by Shelle ▴ 30

0

Entering edit mode

Hello Shelle,

None of your previous posts have gotten to closure. Please provide feedback and accept answers where appropriate.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLY • link 5.6 years ago by Ram 43k

score 3 · Accepted Answer · 2018-09-07

You have to modify @5heikki's answer in the linked thread so it fits your use case. Try the following:

IFS=$'\n'; awk 'BEGIN{FS="\t"}{if($12=="Complete Genome"){print $20}}' assembly_summary.txt | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}' | sh

No species.txt file is needed in your case.

There is no complete genome format. We are downloading only those genomes that have been marked as complete in the relevant column in assembly_summary.txt file.