Biostar Beta. Not for public use.
Question: How does BWA MEM know where to put X and Y origin reads correctly?
0
Entering edit mode

Hi, I have recently been using BWA MEM to align 150bp paired-end reads from 2 cell lines, one is derived from a human male (XY) lineage and the other a female (XX). In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.

How does the algorithm know to align reads across X and Y correctly -- especially in the pseudo-autosomal regions (i.e. the tips) of X and Y p/q arms which "look" the same, like in any other autosomal chromosome.

Does BWA MEM just align/ distribute reads evenly, if they could go to either the PAR of X and Y? To my knowledge I don't think there is an option to include the karyotype in the BWA MEM algorithm... but I guess if you know the sex of the sample then you could supply either an XY ref.fa or an X ref.fa to mitigate what I have outlined.

Do other labs or people use this approach??

Thanks.

ADD COMMENTlink 10 months ago s1667153 • 0 • updated 10 months ago WouterDeCoster 39k
Entering edit mode
0

What is the percent sequence identity between the X-PAR and Y-PAR in hg19/hg38? I think bwa is agnostic to whatever you're aligning to, and that removes some bias, but others might have a better answer.

ADD REPLYlink 10 months ago
RamRS
21k
Entering edit mode
0

It does not know. If reads map to multiple locations equally well (multimappers) they get a MAPQ of 0. If you can be sure that your samples do not contain chrY, you might remove it from the reference and build a new index. Still, I never heard of anyone actually doing that. I personally align against the full genome, including unplaced and random contigs plus the EBV decoy, but excluding alternative (ALT) haplotypes.

ADD REPLYlink 10 months ago
ATpoint
17k
1
Entering edit mode

You should check this in the reference genome you are using, but most likely the PAR on chrY is hardmasked: converted to N nucleotides, and as such those reads will align to the chrX region.

ADD COMMENTlink 10 months ago WouterDeCoster 39k
Entering edit mode
0

Does not seem to May not be hard-masked based on: In the XY case I have reads aligning to both chromosomes, and in the XX case, only to X.
Edit: Unless this statement does not cover PAR region. OP will have to confirm.

ADD REPLYlink 10 months ago
genomax
68k
Entering edit mode
2

This indeed seems to depend on which reference genome you use, as also discussed in this blog post from Heng Li. TLDR: you should use one in which the region is hard-masked.

ADD REPLYlink 10 months ago
WouterDeCoster
39k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0