Hello,
I’m (trying) using the GATK4 germline CNV calling pipeline. I successfully got 57 VCFs from my sample batch, called with segments (obtained by merging the contiguous intervals), like in a classic VCF :
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2046745451-1006_S4
M 3288 CNV_M_3288_15907 N <DEL>,<DUP> . . END=15907 GT:CN:NP:QA:QS:QSE:QSS 2:5:9:17:6:21:21
1 69071 CNV_1_69071_70028 N <DEL>,<DUP> . . END=70028 GT:CN:NP:QA:QS:QSE:QSS 1:0:1:204:204:204:204
But I got way too much of those intervals, more than 10k. I would like to know if there is an existing tool which count the different segments (variants / intervals common by +/= 75% of their length) in one VCF and gives me the count of the different segments overlapped by segments in other sample in my batch. By counting the most redundant segment, I could determine which are background noise and maybe decrease the number of variants in my VCF by filtering.
Thank you.