Hi all,
I am using freebayes to call variants in diploid organism. So far, my understanding of this program's behavior is as follow:
Assume that there are two SNPs at consecutive nucleotide positions, and there are multiple haplotypes such like:
- haplotype_1: XXXAGXXX
- haplotype_2: XXXTGXXX
- haplotype_3: XXXACXXX
case 1
If haplotype_1 was used as a reference, the variations at the two nucleotide positions are called as two separated SNPs in the output VCF.
ref 1234 . A T ...
ref 1235 . G C ...
case 2
On the other hand, if haplotype_2 was used as a reference, the two variant sites are treated as single MNP in the output VCF.
ref 1234 . TG AC,AG ...
This is, basically, very understandable behavior. However, I sometime like to obtain output always like case 2 even in case 1 because this simplifies some downstream analysis such as investigating amino-acid changes and its genotype.
That is, I want VCF line like below for case 1.
ref 1234 . AG AC,TG ...
Is there any option in freebayes to obtain output like this? or I appreciate if you know any other tool or methodology to get lid of this problem?
Thanks