Question

How can I best use -c in bwa mem for alignment and to find polymorphisms?

0

Entering edit mode

8.2 years ago

kirannbishwa01 ★ 1.6k

I am doing bwa mem alignment and want to use the -c in a best way. As, I understand -c INT is meant to discard a MEM reads if it has more than INT occurrences in a genome. The default -c value is 10000, so a read may be potentially aligned to at maximum of 10000 places in the genome. But, reporting of an alignment is also controlled by -T [default value 30], i.e. no alignment will be reported that has value below 30 (1st Question - so is this value of 30 referring to mapQ?).

I want to do a variant calling on my population samples using genome re-seq data that has coverage above 10x (10 to 70). The population should not vary more than 1%.

I previously did bwa mem alignment using default parameter, and I choose uniquely mapped reads using -f 2 flag in samtools, and/or used mapQ values to select for the reads that show good mapping quality (I think mapQ > 39, is good, since anything that maps to more places will have its mapQ value exponentially reduced).

So, if I chose to do -c 1, any reads that maps uniquely won't be affected in the report. But, the reads that potentially can map to multiple places will be selected for the best MEM, right? Since -c is set to 1.

But will this multiple mapping reads have the high mapQ values since its alignment to other places in the genome is restricted?

Additionally, what will happen if the reads can map equally to two different places with equal mismatches and gaps?

Finally, is the setting -c 1 as good way of identifying polymorhisms? - I am kind of in a grey area here and would appreciate any suggestions - I understand it all depends upon if the reads has repetitive bases in it and/or the amount of mismatches/gaps it may have compared to one region of the genome vs. the other one, But, my guess if I set -c 1 and restrict the multiple mapping reads to only one alignment the chances that the alignment and identified polymorphisms is right is high - since the reference genomes are generated using highly inbred lines which are mostly homozygous. My guess is setting -c 1 (or possibly 2) will help use capture valuable polymorphisms, but I also plan on using mapQ to supplement the selection.

Thanks

SNP bwa mapQ sequencing alignment • 4.1k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by kirannbishwa01 ★ 1.6k

score 1 · Answer 1 · 2016-02-21

1

Entering edit mode

8.2 years ago

Istvan Albert 100k

BWA mem will produce SAM flags for both secondary and supplementary alignments. The mapping quality will also be set to zero when the read is multimapping.

In general genomic variation should be discovered from all alignments. Trying to tweak the aligner is not a good strategy. For every alignment you know its mapping quality, you know if it is primary or secondary what good could come from willfully ignoring this information?

ADD COMMENT • link 8.2 years ago by Istvan Albert 100k

1

Entering edit mode

To add to Stefan's comment. I think most SNP callers will not take a multi-mapping read seriously and will give it poor quality. So there is no need to tweak the alignment, rather you can play with the quality of called SNPs.

ADD REPLY • link 8.2 years ago by apelin20 ▴ 480

score 0 · Answer 2 · 2016-02-22

0

Entering edit mode

8.2 years ago

kirannbishwa01 ★ 1.6k

Thanks for the information. I was thinking that tweaking would help but after much reading and reviewing your comments it looks like it may not be helpful.

Thanks much :)

ADD COMMENT • link 8.2 years ago by kirannbishwa01 ★ 1.6k