Find reference genome regions spanned by only mapping quality 0 reads in multiple WGS samples
0
0
Entering edit mode
8 months ago
William ★ 5.3k

For the parallelization of multi-sample variant calling I am looking for reference genome regions to split on.

With the T2T reference genomes, there are not that many polyN regions left to split on.

I am thinking about using the mapping quality 0 regions to split the multi-sample variant calling on.

I would like to find reference genome regions > 500bp that are:

  1. covered by mapping quality = 0 reads
  2. not covered by mapping quality > 0 reads
  3. many/all WGS samples show this pattern

Only when the above 3 points hold it's safe I think to split multi-sample variant calling on these regions

(i.e. take the inverse regions as callable regions to process in parallel)

Input would be 1 FASTA file, many BAM/CRAM files, output a BED file.

mapping-quality BAM FASTA • 283 views
ADD COMMENT

Login before adding your answer.

Traffic: 1352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6