Question

somatic calls by somatic SNV caller differ a lot, when comparing with cancer minors normal germline calls

1

Entering edit mode

6.7 years ago

DVA ▴ 630

I have a few cancer samples that were analyzed using GATK germline pipeline (call SNVs of each sample, not the cohort study setting). Recently we got the corresponding normal samples sequenced, and I did GATK on them as well.

I obtained one sets of somatic calls by subtracting the germline cancer calls from corresponding normal calls. And then, I did Strelka on each cancer and normal pairs. Finally for each pair, I compared the strelka somatic calls, to the subtracted germline results of germline calls.

To my surprise - they are very different. Only 20%-40% positions matches, depending on different samples. To my knowledge, a match of 75%+ is expected. The level of inconsistency makes me hesitate to move further in this project. Any thought on this? (Default settings were used for all callings, my samples are all covered 30X+)

[A little bit detail about how I did the subtraction, in case it's relevant: I know unlike gVCF, normal VCF do not record positions that are not sequenced well, so I ignored the mismatched positions (very small portion anyway) from the two germline VCF files, and only looked at the change of heterogeneity at each position.]

somatic SNV SNP • 2.5k views

ADD COMMENT • link updated 6.7 years ago by d-cameron ★ 2.9k • written 6.7 years ago by DVA ▴ 630

score 3 · Answer 1 · 2017-08-03

3

Entering edit mode

6.7 years ago

szilveszter.juhos ▴ 30

Hi, AFAIK running a germline caller (i.e. HaplotypeCaller) on both the normal and the tumour sample and subtracting the calls to get somatic variants is suboptimal: these calls usually have low allele frequencies, so you need a somatic caller like MuTect2, Strelka, whatever. We are using the germline caller only for QC, to be sure we not mixing up matching samples (and to get germline variants of course). If you have tissue samples with high heterogeneity (high percent of normal cells in the tumour tissue, or multiple clones) it is not surprising to get a low concordance compared to a somatic caller.

ADD COMMENT • link 6.7 years ago by szilveszter.juhos ▴ 30

0

Entering edit mode

Thank you for the reply. By low allele freq, do you mean that the one of the allele is only supported by limited number of reads, but somehow the germline caller captured it? Of curiosity, have you done such comparison using your data?

ADD REPLY • link 6.7 years ago by DVA ▴ 630

score 2 · Answer 2 · 2017-08-03

Every caller will only call variant above a certain threshold. If you do independent variant calling on your germline and tumour then a (possibly large) portion of your germline variants will be in near that threshold and just happen to be below in one and above in the other. This will result in a number oof false positive somatic/somatic LOH calls.

By joint calling using a somatic caller, your effective germline coverage is the combined coverage (i.e 60x), and your somatic call set will contain fewer germline variants that are incorrectly called as somatic.

Purity, anuploidy, and sub-clonality all effect the allele frequency of the somatic calls such that, unlike germline calls where a AF of 0, 0.5 or 1 is expected, somatic variant allele frequencies can take a range of values. As such, it is not surprising that a germline callers such as GATK which genotype variants using a diploid model, does not perform well on somatic samples.