This site is a beta test.
Question: How to download Refseq of bacteria as a COMPLETE genome format
0
Entering edit mode
14 months ago
Shelle • 0

Can anyone tell me how i can download the complete genome format of bacteria Refseq from NCBI ? I mention complete genome as i don't want any word of chromosome be in the FASTA files. I just saw some post regarding this matter and it seems with the name of organism in the format of text file will do the job. I can get the CSV file of the organism from this link " https://www.ncbi.nlm.nih.gov/genome/browse/#!/overview/ " and extract only the first column. If I use the script from this thread How to download COMPLETE bacterial genomes from NCBI based on list of names?, it seems it is not working and nothing will be downloaded. Can anyone tell me if this code is compatible to every format of species.txt or should i reformat it somehow?

cat species.txt
"'Brassica napus' phytoplasma"
"'Candidatus Kapabacteria' thiocyanatum"
"'Chrysanthemum coronarium' phytoplasma"
"'Echinacea purpurea' witches'-broom phytoplasma"
"'Osedax' symbiont bacterium Rs2_46_30_T18"
"'Sphingomonas ginsengisoli' Hoang et al. 2012"
"Abaca bunchy top virus"
"Abalone herpesvirus Victoria/AUS/2009"
"Abalone shriveling syndrome-associated virus"
"Abditibacterium utsteinense"
"Abelson murine leukemia virus"
"Abeoforma whisleri"
"Abiotrophia"
"Abiotrophia defectiva"
"Abisko virus"
"Absidia glauca"
"Absidia repens"
"Absiella dolichum"
"Abutilon Brazil virus"
"Abutilon golden mosaic virus"
"Abutilon mosaic Bolivia virus"
"Abutilon mosaic Brazil virus"
"Abutilon mosaic virus"



wget ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt

IFS=$'\n'; for next in $(cat species.txt); do awk -v SPECIES=^"$next" 'BEGIN{FS="\t"}{if($8 ~ SPECIES && $12=="Complete Genome"){print $20}}' assembly_summary.txt \
    | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}'; done \
    | sh
ADD COMMENTlink 14 months ago Shelle • 0 • updated 14 months ago genomax 68k
Entering edit mode
0

Hello Shelle,

None of your previous posts have gotten to closure. Please provide feedback and accept answers where appropriate.

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLYlink 14 months ago
RamRS
21k
3
Entering edit mode
14 months ago
genomax 68k
United States

You have to modify @5heikki's answer in the linked thread so it fits your use case. Try the following:

IFS=$'\n'; awk 'BEGIN{FS="\t"}{if($12=="Complete Genome"){print $20}}' assembly_summary.txt | awk 'BEGIN{OFS=FS="/"}{print "wget "$0,$NF"_genomic.fna.gz"}' | sh

No species.txt file is needed in your case.

There is no complete genome format. We are downloading only those genomes that have been marked as complete in the relevant column in assembly_summary.txt file.

ADD COMMENTlink 14 months ago genomax 68k

Login before adding your answer.

Powered by the version 1.5.2