Extracting the best reads that align multiple times
0
0
Entering edit mode
7.0 years ago
ioannis ▴ 50

Hello community,

I am using bowtie2 to align sequences to a reference genome. The results are quite disappointing: 48% of the reads align exactly 1 time and 44% of the reads aligned more than once.

I have single-end reads 55-70bp long. The reference genome is the OreoNil2 (Oreochromis niloticus).

I am not sure about this, but I guess each sequence that aligns multiple times has different score according to how good is the alignment on the reference genome. I would like to extract in a new sam file the reads that align only once (48%) and the reads with the best score among the reads that align multiple times.

Does anybody knows if this is possible and how to do something like that? Do I introduce any bias if I pick those reads?

Thanks in advance!

alignment • 2.0k views
ADD COMMENT
0
Entering edit mode

Might be worth trying bwa and comparing results. If these are paired end reads I would expect a smaller proportion of multiple mappings.

ADD REPLY
0
Entering edit mode

Once you pick a subset of reads with higher scores, yes, you will introduce bias. What is your ultimate goal?

ADD REPLY
0
Entering edit mode

My goal is to get as much alignments as I can but as it seems, I have to use less than 50% of my total reads. I have hydroxymethylation data and I need coverage, as much as I can get. I will try different aligners just to see if I get better results. However, I think that bowtie2 is a quite good aligner. So, I do not hope for a miracle. Thank you for your input!

ADD REPLY
0
Entering edit mode

You'll typically get a much higher alignment rate with BBMap compared to Bowtie2, when using data with low identity to the reference. Particularly, you can add the flag "slow" or "vslow", and use a shorter kmer length such as 11, to increase the alignment rate even more.

ADD REPLY

Login before adding your answer.

Traffic: 2800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6