I've starting to use tblastn tool to align protein sequences against a genomic reference. But when comparing web and command line results of the tool, the evalues are different.
Example:
Using the following query fasta:
>query
VPPPLGDQISVPLN
And this subject fasta:
>1
ACTACTTTACGGTACCCCTAAACTTGGGGGATCAGATCTCTAGTACTACT
>2
TCCTTAATTTTTACTAATGGAACCGGCCGCCTATGGGTTATTGGTTT
>3
TTAAAAGGTACCTACCCTGGGATTTTAACCTAGGGATTTTCC
Using the NCBI web interface I get this alignment:
Query 1 VPPPLGDQIS 10
VP LGDQIS
Sbjct 12 VPLNLGDQIS 41
With an Evalue of 6e-05.
Doing the same query in my PC with the following command:
tblastn -query filewithmyquery.fa -subject filewithmysubject.fa
The alignment is identical, but the Evalue is 2e-04.
I made a try changing the subject to this:
>1
ACTACTTTACGGTACCCCTAAACTTGGGGGATCAGATCTCTAGTACTACT
And I got identical results in web and command line (Evalue of 6e-05).
The version of the command line tool is 2.3.0+, the same as in the web. Do I have to change some parameter in the command line version of the tool to get results identical to the web?
Blast e-values depend on the database size. Read about the blast statistics here.
Also make sure that the parameters you use on the command line (or the defaults) are set to the the same values used on the web.
I've saved the search strategy used in the web version (it's supposed to contain all the search parameters) and I've ran it on my PC using:
And the results differ. Evalues depend on the database size, but I don't know if the web and command line are inferring the same value from the subject.
If I execute the first tblastn command of the example but changing the the dbsize to 50 (the size of the first entry of the fasta), I get the same Evalue of the web.