Blastn and Multiple Databases: how best to manage
2
1
Entering edit mode
9.2 years ago
jeremy.cox.2 ▴ 130

How do I best manage multiple BLAST databases?

So I am pretty new to Bioinformatics, but I am a computer guy. I have a few questions about how to best use blastn to achieve my goal.

I have multiple databases I prepared, creatively named virus.fa, bacteria.fa, fungi.fa, human.fa, mouse.fa, and rat.fa.

I want to be able to BLAST against any combination of the databases, hopefully without doing anything crazy like computing all database permutations.

  1. I don't think I can BLAST against multiple databases at once, is that correct?
  2. As I understand it, I can make 6 separate blast databases and blast against them one at a time. Then I concatenate the results.
    1. Is this computationally wasteful?
  3. I could make 1 big database and ignore hits for organisms I don't want to search.
    1. This is obviously wasteful.
    2. Can makeblastdb take multiple input files? I don't think it can, so I would have to cat them up before making the db.
  4. Is there a solution I am missing, hopefully an elegant solution?

Thank you,
Jeremy Cox
CSE PhD student

Blast • 3.5k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
5heikki 11k

You should just combine the dbs (are those dbs or just fasta files?) and then make aliases for subset dbs..

 blastdb_aliastool -h

USAGE
  blastdb_aliastool [-h] [-help] [-gi_file_in input_file]
    [-gi_file_out output_file] [-db dbname] [-dbtype molecule_type]
    [-title database_title] [-gilist input_file] [-out database_name]
    [-dblist database_names] [-dblist_file file_name]
    [-num_volumes positive_integer] [-logfile File_Name] [-version]

DESCRIPTION
   Application to create BLAST database aliases, version 2.2.29+

   This application has three modes of operation:

   1) GI file conversion:
      Converts a text file containing GIs (one per line) to a more efficient
      binary format. This can be provided as an argument to the -gilist option
      of the BLAST search command line binaries or to the -gilist option of
      this program to create an alias file for a BLAST database (see below).

   2) Alias file creation (restricting with GI List):
      Creates an alias for a BLAST database and a GI list which restricts this
      database. This is useful if one often searches a subset of a database
      (e.g., based on organism or a curated list). The alias file makes the
      search appear as if one were searching a regular BLAST database rather
      than the subset of one.

   3) Alias file creation (aggregating BLAST databases):
      Creates an alias for multiple BLAST databases. All databases must be of
      the same molecule type (no validation is done). The relevant options are
      -dblist and -num_volumes.
ADD COMMENT
0
Entering edit mode

Based on the information you provided, it looks like I can keep the 6 databases separate and create an alias file to refer to multiple databases.

blastdb_aliastool -db microbiome -dblist "virus fungi bacteria"

I think this is the opposite of what you described in making alias for subsets?

ADD REPLY
0
Entering edit mode

Yes, you can do that too..

ADD REPLY
0
Entering edit mode
9.2 years ago
jeremy.cox.2 ▴ 130

I found this topic providing helpful answers

How To Blast A Sequence Against Multiple Databases

Sorry for duplicate question

ADD COMMENT

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6