BLAST: One read matches same region multiple times
3
0
Entering edit mode
8.1 years ago
godeludanu ▴ 30

I am aligning nanopore reads to the C. Elegans genome to identify coverage across the genome.

There is a region in the C. Elegans genome which has a very high number of reads matching (an order of magnitude higher than others). I think this is because its a repeat region and has lots of homopolymers. So reads from this region have a lot of errors and their alignment here is ambiguous. As a result a single read blasted to this annoying region ends up with multiple hits because blast can't figure out the best alignment.

Can you suggest any strategies to work around this? My current thought is to prevent BLAST finding multiple hits for a single read in the same region. Is this a good strategy and what is the best way to implement this?

Thanks for your time.

blast nanopore • 2.2k views
ADD COMMENT
0
Entering edit mode

I have no experience with Nanopore but I'm wondering whether blast is the right tool for read mapping in general. Blast is tuned to find regions of similarity between possibly distant species, so it expects to find a sequence aligned at mulriple places and I think it doesn't have the concept of 'mapping quality' (i.e. probability that the mapping is wrong as opposed to alignment score or e-value). I would suggest to try bwa mem which is designed to work with long reads, possibly split across large gaps.

ADD REPLY
2
Entering edit mode
8.1 years ago
abascalfederico ★ 1.2k

There is no simple solution for repetitive regions. If you you are not interested in them, why don't you mask them from the genome? You can mask according to repeatMasker, to trf (tandem repeat finder) and/or to dust

HTH

ADD COMMENT
0
Entering edit mode

While this strategy can work it sounds like @godeludanu is interested in (finding and) keeping the "best" alignment in this region. I don't know long the reads are in this case but trying a different aligner (e.g. LASTZ) may be a better option.

ADD REPLY
2
Entering edit mode
8.1 years ago
shwethacm ▴ 240

My first thought is - how long are your reads? If they are several kilobases in length then use bwa-mem (choose the blasr option) or other mapping tools that are tuned to align PacBio and PacBio-like reads. These are optimized for long read length. You will have to filter your output file to find the optimal best alignment.

ADD COMMENT
1
Entering edit mode
8.1 years ago

LAST is an aligner which is used more often for Nanopore sequencing. Perhaps using NanoOK could tell you a lot about your data: https://github.com/TGAC/NanoOK

ADD COMMENT

Login before adding your answer.

Traffic: 2934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6