Question

How does BWA MEM know where to put X and Y origin reads correctly?

0

Entering edit mode

5.0 years ago

s1667153 • 0

Hi, I have recently been using BWA MEM to align 150bp paired-end reads from 2 cell lines, one is derived from a human male (XY) lineage and the other a female (XX). In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.

How does the algorithm know to align reads across X and Y correctly -- especially in the pseudo-autosomal regions (i.e. the tips) of X and Y p/q arms which "look" the same, like in any other autosomal chromosome.

Does BWA MEM just align/ distribute reads evenly, if they could go to either the PAR of X and Y? To my knowledge I don't think there is an option to include the karyotype in the BWA MEM algorithm... but I guess if you know the sex of the sample then you could supply either an XY ref.fa or an X ref.fa to mitigate what I have outlined.

Do other labs or people use this approach??

Thanks.

BWA MEM alignment genetics genomics • 1.9k views

ADD COMMENT • link updated 5.0 years ago by WouterDeCoster 47k • written 5.0 years ago by s1667153 • 0

0

Entering edit mode

What is the percent sequence identity between the X-PAR and Y-PAR in hg19/hg38? I think bwa is agnostic to whatever you're aligning to, and that removes some bias, but others might have a better answer.

ADD REPLY • link 5.0 years ago by Ram 43k

0

Entering edit mode

It does not know. If reads map to multiple locations equally well (multimappers) they get a MAPQ of 0. If you can be sure that your samples do not contain chrY, you might remove it from the reference and build a new index. Still, I never heard of anyone actually doing that. I personally align against the full genome, including unplaced and random contigs plus the EBV decoy, but excluding alternative (ALT) haplotypes.

ADD REPLY • link 5.0 years ago by ATpoint 81k

score 1 · Answer 1 · 2019-04-29

1

Entering edit mode

5.0 years ago

WouterDeCoster 47k

You should check this in the reference genome you are using, but most likely the PAR on chrY is hardmasked: converted to N nucleotides, and as such those reads will align to the chrX region.

ADD COMMENT • link 5.0 years ago by WouterDeCoster 47k

0

Entering edit mode

~~Does not seem to~~ May not be hard-masked based on: In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.
Edit: Unless this statement does not cover PAR region. OP will have to confirm.

ADD REPLY • link 5.0 years ago by GenoMax 141k

2

Entering edit mode

This indeed seems to depend on which reference genome you use, as also discussed in this blog post from Heng Li. TLDR: you should use one in which the region is hard-masked.

ADD REPLY • link 5.0 years ago by WouterDeCoster 47k