BLAST results shows partial matches
2
0
Entering edit mode
3.1 years ago
langziv ▴ 50

Hello fellow researchers!

I got BLAST results that show that only small part of each sequence in an assembly match the BLAST database. The greatest number of such matches is for bacteria. It's expected to have contamination of bacteria, so the question is why, for instance, only half of a sequence in the assembly matches bacteria according to the BLAST results, and not the entire sequence or 80% or 90% of it?

Thanks!

blast BLAST assembly • 1.6k views
ADD COMMENT
1
Entering edit mode
3.1 years ago

well, to start BLAST is a local aligner, so it purpose it to look for local highly similar stretches.

why you only get parts of contigs matching the DB is likely because you have chimeric assembled contigs (== a correct part is fused to a 'contamination' part ). This happens because there are stretches of DNA (genes for instances) that are present in several species and can thus confuse the assembler which results in chimeric contigs.

what blast DB are you using? if it is only a subset DB (eg. only bacteria) you will also introduce a bias as the correct match might not be present in the DB used.

ADD COMMENT
0
Entering edit mode

Thanks. I use the database for all the species, not only bacteria. I'd love to hear your opinion: Considering your answer, if I'll remove the sequences that align with contamination such as bacteria, I'll get sequences that might be from the species I'm interested in. Those remaining sequences can be used for farther analysis such as alignment with the NCBI reference genome of the species of interest.

Does that sound reasonable?

ADD REPLY
0
Entering edit mode
3.1 years ago

Hi, if you want to exclude these, you can set the parameters correctly. We have a project here https://github.com/colindaven/nf-blast which only uses 80% percentage identity and above:

In the main.nf, the blast is defined like this:

blastn -db $db_path/$db_name -query query.fa -perc_identity 80 -max_target_seqs 10 -evalue 1 -num_threads 4 -outfmt 6 > blast_result

What you're referring to is likely this part:

-perc_identity 80
ADD COMMENT
0
Entering edit mode

though a valid parameter, not sure it will help here . OP is asking to get full length alignments and even when setting this parameter to 80 or such it can still only be on a small(er) part of the whole sequence.

Without post-processing the output it is not possible to filter on coverage between query & hit

ADD REPLY

Login before adding your answer.

Traffic: 2655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6