Correct way to report variant statistics from bcftools stats report
13 months ago
prasundutta87 • 330


I am making a table of family-wise variants statistics. I have a multisample VCF file of biallelic SNPs on which I ran BCFtools v1.6 using the following command.

bcftools stats -s - <multisample VCF file>

This gives me the following output (edited for brevity):

sample  nRefHom nNonRefHom  nHets
family_1_sample1    191929  159 24424
family_1_sample2    185432  522 30505
family_2_sample1    186873  538 29132
family_2_sample2    189493  632 26333

So, when I report the number of variants per family, should I be doing this calculation?

family 1:

no. of variants=nNonRefHom+nHets i.e. 159+522+24424+30505=55610

Same goes with family 2.

Second question

SN  id  key value
SN  0   number of samples:  2
SN  0   number of records:  216734
SN  0   number of no-ALTs:  0
SN  0   number of SNPs: 216734

When should we specifically report the number of records or number of SNPs? Do we report it when we are not interested in sample-specific information? For ex. no. of SNPs discovered from a multisample variant calling pipeline?


