Potentially a very stupid question, but it wouldn't be my first ever :)
So looking at the sample VCF files from this Biostars (subset_hg19.vcf
), I see a bunch of lines with 0|0
in them.
While I realize this means it has identified that position as matching REF | REF
- why would reference matching regions be in the variants file at all? I thought only variants (positions that DID NOT match the REF in one or both alleles) showed up.
When I run bcftools call
I DONT get any 0/0
or 0|0
. Does one have to specifically set the call
to identify the matching references - and why would one?
Thanks!
19 416254 rs192385198 T C 100 PASS AC=0;AF=0.000599042;AN=12;NS=2504;DP=18252;EAS_AF=0;AMR_AF=0.0043;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP GT 0|0 0|0 0|0 0|0 0|0 0|0
19 416335 rs545360745 G C 100 PASS AC=0;AF=0.000199681;AN=12;NS=2504;DP=18554;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP GT 0|0 0|0 0|0 0|0 0|0 0|0
19 416389 rs564036507 G C 100 PASS AC=0;AF=0.000199681;AN=12;NS=2504;DP=17931;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP GT 0|0 0|0 0|0 0|0 0|0 0|0
19 416406 rs185752424 G A 100 PASS AC=0;AF=0.000199681;AN=12;NS=2504;DP=17410;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP GT 0|0 0|0 0|0 0|0 0|0 0|0
19 416449 rs929834 C T 100 PASS AC=0;AF=0.0924521;AN=12;NS=2504;DP=16305;EAS_AF=0;AMR_AF=0.0216;AFR_AF=0.3336;EUR_AF=0.002;SAS_AF=0.0051;AA=C|||;VT=SNP GT 0|0 0|0 0|0 0|0 0|0
0|0
Valid point. I wonder why my files have
./.:.
instead of "0/0" in the merged file? Maybe becausebcftools
didnt know if the SNPs were 0/0 or just missing data?0/0 means "this is twice a reference allele".
./. means "I don't know what this is".
Yes, Wouter is correct for 0/0, whilst ./. means that no genotype could be called at this position. It is a missing value. There may have been no or insufficient reads. I have also explained this in your other question earlier today: Unusual reports from "bcftools stats" (making me question my data)
BCFTools could easily call 0/0 genotypes, but why would it? The VCF file would then grow by a magnitude of thousands or millions because it would be reporting each and every reference base in the VCF.
"BCFTools could easily call 0/0 genotypes, but why would it?" Well, exactly my thought actually :D
All my BCF files have "./.:." for samples that are REF/REF. But this one had "0|0" meaning it's not a variant, but a reference.
So I couldn't figure out why they were there :D