Refseq genomic BLAST database
2
0
Entering edit mode
6.0 years ago
chland • 0

Hi everyone, I'd like to use a preformatted bacterial database on NCBI to run blastn searches and used the update_blastdb script to download the refseq_genomic database. All of the downloaded files are listed as refseq_genomic.## where ## is 04, or 07, 195 etc. I would like to know how I can find which of these files is my bacteria of interest, such as Streptococcus, Shewanella etc? Thanks in advance

NCBI BLAST database • 2.8k views
ADD COMMENT
0
Entering edit mode

thanks, So I would have to point to the directory as refseq_genomic.*.tar.gz for both blastn searches and extracting sequence using blastdbcmd?

ADD REPLY
0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode
6.0 years ago

I don't think that is even possible.

In any case what you want (or need to do) is to use all parts for your blast searches. They all together form a single DB, you have to use it as -db refseq_genomic in your blastcmdline (so omitting the .## in the name)

ADD COMMENT
0
Entering edit mode
6.0 years ago
GenoMax 141k

Use the answer provided by @5heikki in this thread to download the genomes: How to download COMPLETE bacterial genomes from NCBI based on list of names? Then index them and blast away.

You can find the names in this file (I am only looking for reference genomes).

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
grep "reference" assembly_summary_refseq.txt | awk -F '\t' '{print $8}' > names
ADD COMMENT

Login before adding your answer.

Traffic: 2744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6