I'd like to download the assembly files for bacteria, archaea, virus, fungi, and protozoa from the NCBI website. Since there are so many files, it isn't practical for me to download each one manually. Using wget, I'm able to download at the directory level. For instance, using wget -r -l 20 --no-parent --reject "index.html*" "ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/" gives me everything in the archaea directory for each species. The problem is that it skips the assembly directory, which is the part I really need. For instance, I get everything in ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/archaea/Acidianus_hospitalis/ except for latest_assembly_versions/GCF_000213215.1_ASM21321v1, which is the assembly directory. Does anybody know how I can download this data in batch?
Hi, I just wanted to say thanks for the solution. This worked well.