Calling haploid consensus sequence from vcf
14 months ago

Hi there,

I have a filtered VCF file of mtDNA genotypes for multiple individuals. The calls are haploid and therefore the genotype field in the VCF is either 0 or 1. I would like to call consensus fasta files for each individual. However, it is my understanding that tools like bcftools consensus, vcf-consensus and GATK's FastaAlternateReferenceMaker apply all ALT variants to the reference fasta to obtain the consensus.

How can I generate consensus sequencess where the the ALT allele is assigned if the GT field is 1 and the REF allele if the GT field is 0?

Thanks in advance.

13 months ago
Republic of Ireland

You may require 2 passes for each sample. First, split your VCF based on GTs of 0 (REFs) and GTs of 1 (ALTs), and then run bcftools consensus twice for each sample, making use of the following parameter:

-H, --haplotype <which>    choose which allele to use from the FORMAT/GT field, note
                           the codes are case-insensitive:
                           1: first allele from GT
                           2: second allele
                           R: REF allele in het genotypes
                           A: ALT allele
                           LR,LA: longer allele and REF/ALT if equal length
                           SR,SA: shorter allele and REF/ALT if equal length



