remove duplicate indels from merged vcfs
14 months ago
sbberes • 0

I am merging polymorphism calls from multiple bacterial strains using bcftools merge, and am getting multiple minor variants of the same insertion at the same position from different strains. So from 80 strains I may get 30 different interpretations of the same indel so that in the merged vcf the indel gets treated like it is 30 different variants instead of the same single insertion. I have used bcftools norm to left align the indels and have used vcfcreatemulti to reduce the indels to being a single read in the vcf file. I have tried using bcftools --collapse indels to try to collapse the multiple different interpretations of the insertion down to a single common indel, but so far none of these processes have done the trick. Am looking for any suggestion as to how to deconvolute/condense these multiple indel calls down to a single common call. Thanks for any assistance, SBB

