Entering edit mode
8.9 years ago
simono101
▴
20
In my VCF file I have two variants at consecutive positions. The first was called as a single base pair deletion event. The second was called as a point mutation...
Supercontig_1.1 270 . `TG T ` 239.77 . <SNIP>;VariantType=DELETION.NumRepetitions_2.EventLength_1.RepeatExpansion_G GT:AD:DP:GQ:PGT:PID:PL 0/1:142,17:159:99:0|1:249_G_A:268,0,6225
Supercontig_1.1 271 . `G A` 1317.77 . <SNiP>;VariantType=SNP GT:AB:AD:DP:GQ:PL 0/1:0.720:113,43:156:99:1346,0,4123
In this example shouldn't there be a single call for the G => A mutation at position 271? I thought that e.g. bcftools norm
was designed to handle these cases? How does one go about handling these?
For completeness sake, the positions surrounding these two variant calls are thus:
Supercontig_1.1 269 . T . . . DP=323;GC=42.86;VariantType=NO_VARIATION GT:AD:DP 0/0:321:323
Supercontig_1.1 270 . TG T 239.77 . <SNIP> GT:AD:DP:GQ:PGT:PID:PL 0/1:142,17:159:99:0|1:249_G_A:268,0,6225
Supercontig_1.1 271 . G A 1317.77 . <SNIP> GT:AB:AD:DP:GQ:PL 0/1:0.720:113,43:156:99:1346,0,4123
Supercontig_1.1 272 . G . . . <SNIP> GT:AD:DP 0/0:294:332
Supercontig_1.1 273 . A . . . <SNIP> GT:AD:DP 0/0:327:338
Are you sure that the deletion is actually equivalent to the G->A variant? That isn't necessarily the case. Let's suppose that the original sequence starting at position 270 is TGT. Then this would indicate that you have two alleles, TT and TAT.
@Devon, Thanks. I'm not sure. And I'm not sure I'm interpreting it correctly! If you look at the surrounding positions (which I added to the question) the reference sequence is
The alternate alleles at position 270 and 271 are both heterozygous calls, so what you are saying is that potentially I could have TTGGA (reference), TT-GA (deletion), TTAGA (SNP)? And both could exist?
You would have TTGA and TTAGA.