Question

Calculate genome regions to exclude for Structural Variant calling

0

Entering edit mode

2.4 years ago

William ★ 5.3k

Structural variant calling with short Illumina reads for many samples still takes a significant amount of time. And also results in a relative high number of false positive structural variants (v.s. SNPs).

Is there are reliable way to calculate genome regions to exclude for Structural Variant calling? Without losing informative structural variants that are potentially related to a phenotype?

For example in literature it is mentioned that SVs are mostly in centromeric, pericentromeric, subtelomeric and other low complexity/repetitive regions.

But is also mentioned in literature that SVs in complex/repetitive regions might be related to phenotypes.

See e.g. https://www.nature.com/articles/s41467-019-08992-7

enter image description here

Is there a way to calculate low complexity/repetitive genome regions that can be excluded from structural variant, without losing true positive structural variant calls?

These regions might already be available for the human genome. I am looking for a method that can be applied to any species/reference genome.

SV • 472 views

ADD COMMENT • link 2.4 years ago by William ★ 5.3k