I am making a table of family-wise variants statistics. I have a multisample VCF file of biallelic SNPs on which I ran BCFtools v1.6 using the following command.
bcftools stats -s - <multisample VCF file>
This gives me the following output (edited for brevity):
sample nRefHom nNonRefHom nHets
family_1_sample1 191929 159 24424
family_1_sample2 185432 522 30505
family_2_sample1 186873 538 29132
family_2_sample2 189493 632 26333
So, when I report the number of variants per family, should I be doing this calculation?
no. of variants=nNonRefHom+nHets i.e. 159+522+24424+30505=55610
Same goes with family 2.
SN id key value
SN 0 number of samples: 2
SN 0 number of records: 216734
SN 0 number of no-ALTs: 0
SN 0 number of SNPs: 216734
When should we specifically report the number of records or number of SNPs? Do we report it when we are not interested in sample-specific information? For ex. no. of SNPs discovered from a multisample variant calling pipeline?