BLAST run - assigning which one as query and subject?
Entering edit mode
8.2 years ago
pbigbig ▴ 250

Hi everyone,

I have some confusion here that would be great if you could clarify: Supposed I have a large set of scaffold in fasta format (from a genome assembly for example, and it may contain assembly errors), and I have a small reference cDNA set (obtained from Ensembl, so it can be considered as high quality reference). Normally, I was told that the larger set should be the subject for BLAST-ing and the smaller one should be used as query. Thus, should I makeblastdb of my large scaffold set and query the reference cDNA set against it? or doing vice-versa? (I have the feeling that using ref cDNA set as query is quite counter-intuitive because its role is for reference, so it should be the subject for BLAST-ing, isn't it?)

Thank you very much for any suggestion and clarification!

BLAST • 3.5k views
Entering edit mode
8.2 years ago
Michael 54k

Using the small sequence set as a reference would give false low E-values, as those depend on the DB size and not on the total length or number of query sequences tested (Run a certain query sequence for itself, and with a bunch of other queries, the E-values are identical). You would get a massive multiple testing problem which is not corrected for. The assembly is much closer to the complete reference of all sequences that occur in this organism.

Entering edit mode

Hi Michael, thank you very much for your clarification.


Login before adding your answer.

Traffic: 959 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6