Get Blast Database Size
1
0
Entering edit mode
10.6 years ago
PoGibas 5.1k

I have piped unknown length sequences into makeblastdb. Now I want to know total length of those sequences (BLAST database size).

Example:

 # "cat" is used as an example
 # My "real" sequences are piped from "make random length sequences" command
 cat unknown_length_sequences
     >Seq1
     AA --//-- TG
     >Seq2
     GG --//-- TA
     >Seq3
     AC --//-- CC
     ...

 cat unknown_length_sequences | 
     makeblastdb \ 
         -in - \
         -dbtype 'nucl' \
         -parse_seqids \
         -out random_seq \
         -title "random_seq"

  Output files look like this:  
     random_seq.nhr
     random_seq.nin
     random_seq.nog
     random_seq.nsd
     random_seq.nsi
     random_seq.nsq

My question is - How to get BLAST database size (length of all the sequences)?
Result should be the same as using:
grep -v '>' INPUT | tr -d '\n' | wc

Edit
I want to achieve this without making intermediate files.

blast • 4.5k views
ADD COMMENT
1
Entering edit mode
10.6 years ago
Michael 54k

How about parsing the output of blastdbcmd -info and take the total residues from 2. line:

$blastdbcmd -info -db ~/blastdb/swissprot
Database: Non-redundant UniProtKB/SwissProt sequences
    455,621 sequences; 169,969,125 total residues

Date: Jul 29, 2013  6:13 AM    Longest sequence: 41,943 residues
[...]
ADD COMMENT
0
Entering edit mode

Hm, I should update my blastdb...

ADD REPLY

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6