I have a few cancer samples that were analyzed using GATK germline pipeline (call SNVs of each sample, not the cohort study setting). Recently we got the corresponding normal samples sequenced, and I did GATK on them as well.
I obtained one sets of somatic calls by subtracting the germline cancer calls from corresponding normal calls. And then, I did Strelka on each cancer and normal pairs. Finally for each pair, I compared the strelka somatic calls, to the subtracted germline results of germline calls.
To my surprise - they are very different. Only 20%-40% positions matches, depending on different samples. To my knowledge, a match of 75%+ is expected. The level of inconsistency makes me hesitate to move further in this project. Any thought on this? (Default settings were used for all callings, my samples are all covered 30X+)
[A little bit detail about how I did the subtraction, in case it's relevant: I know unlike gVCF, normal VCF do not record positions that are not sequenced well, so I ignored the mismatched positions (very small portion anyway) from the two germline VCF files, and only looked at the change of heterogeneity at each position.]
Thank you for the reply. By low allele freq, do you mean that the one of the allele is only supported by limited number of reads, but somehow the germline caller captured it? Of curiosity, have you done such comparison using your data?