Filter command-line BLAST results by organism
1
0
Entering edit mode
9.5 years ago
jdromano2 ▴ 10

In legacy blast (blastall) it was easy to do a remote BLAST query from the command line that only returned hits for a single organism:

blastall -u "[Arabidopsis thaliana[organism]" -p blastp -d nr -i my_protein_input.fas -o my_protein_input_vs_nr_Athaliana.blastout.txt

However, this seems to have been removed with the release of BLAST+. Aside from filtering hits after they are returned, is it possible to perform a query on the 'nr' database that only returns hits for a single specified taxon?

alignment blast genome taxonomy • 5.1k views
ADD COMMENT
1
Entering edit mode

I think you just add the -remote flag and use -entrez_query instead

ADD REPLY
0
Entering edit mode

How can use more than one species in the remote search.

Thanks

ADD REPLY
2
Entering edit mode
9.3 years ago
agoel ▴ 20

I'm having a similar problem. It seems the best option is to create your own database. Make a blast_db with the fasta file for your organism if you have it, otherwise you have to do something like this:

  1. Download the prebuilt nr database (ncbi?).
  2. Search the Entrez Protein database with query: "txid7742[ORGN]"
  3. Select "Send to File" and choose format "GI list"
  4. Use the list of GIs from the previous step with the blastdb_aliastool to build an aliased blastdb of just your organism (takes several seconds), eg:

    blastdb_aliastool -gilist vertebrates.gi_list.txt -db nr -out nr_vertebrates -title nr_vertebrates
    
  5. Search against your new (aliased) database:

    blastx -query query.fa -db nr_vertebrates
    

(from this thread: Vertebrate Subset Nr Database? Build My Own?)

ADD COMMENT

Login before adding your answer.

Traffic: 2372 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6