16 months ago
There certainly are tools available that correctly handle sex chromosomes. With the RTG suite of tools, your human reference can contain a configuration file that specifies the autosomes, sex chromosomes, PAR regions etc, and this file is consulted during both mapping and variant calling. During processing for a sample, you specify the sex (if known), and it Just Works.
For example, during mapping (
rtg map), when processing a sample specified as female, the aligner will not attempt to map reads to the Y chromosome. Similarly it will not map reads to the PAR region on the Y chromosome for males (although most human references already have this region masked out anyway). When performing variant calling (
rtg snp / rtg family / rtg population / rtg somatic) on a sample specified as male, the caller automatically uses haploid calling for the X and Y chromosomes (except for the PAR regions, where diploid calling is carried out). When performing joint calling of multiple samples that form a pedigree, either a single family (
rtg family), or larger multi-generation pedigree (
rtg population), the variant calling utilizes mendelian inheritance in the Bayesian models to further inform the variant calling (including appropriate behaviour on the sex chromosomes). Of course, you can manually override any of this if you want.
For all of the modes, there is a validation tool that executes automatically at the end of a mapping job, and it can be explicitly invoked (
rtg chrstats) if you have split your mapping into multiple smaller jobs and want it to use aggregate statistics, which analyses the coverage of the autosomes and sex chromosomes to indicate whether the sample may have been mapped with a sex that does not correspond to what the sample actually is (for example, due to sample mislabelling, incorrect pedigree information, or chromosomal abnormalities such as XXY, trisomy, etc).
Give it a go, RTG Core (which contains all that goodness) is free for non-commercial use.