Fundamental BLAST problem
1
0
Entering edit mode
7.5 years ago

I ran a local blastp on the nr database from NCBI and got 100,000 hits. I organized the ones I wanted to keep in excel and I have a text file of all of their headers/description lines. How do I use what I have to get all of the actual sequences from NCBI? This may be a batch entrez thing, or it may possibly be the exact opposite...either way I figured this is a very common issue people deal with but I couldn't find a concrete solution.

blast blastp retrieve seqs • 1.5k views
ADD COMMENT
7
Entering edit mode
7.5 years ago
GenoMax 141k

You use the identifiers you are interested in and query nr database using a tool called blastdbcmd that is included in blast+ package.

Put your identifiers (one on each line, use Accession #) and -entry_batch id_file option with blastdbcmd.
Your command would look something like: blastdbcmd -db /path_to/nr -entry_batch Acc_ID_file -outfmt '%f' -out sequence_file

ADD COMMENT
0
Entering edit mode

Perfect! Thank you very much. I was actually looking at this before but I wasn't entirely sure.

ADD REPLY
0
Entering edit mode

This might be a REALLY stupid question, but do I need to use the unformatted fasta nr database?

EDIT: I tested it out and learned that I can just use the formatted db I was using for blastp. Thanks again; your command example worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6