Reason for Freebayes Calling Multi-base Variants
1
0
Entering edit mode
8.0 years ago
L. A. Liggett ▴ 120

The question I have is why freebayes reports variants in an output vcf file as multiple bases and an incorrect locus. For instance, below the variant is the same, and taken from the same originating sequence DNA. However, when I create a fastq file containing only reads that span the chr4:11054232 locus freebayes report the variant at 11054232 and a C to T change. If I call variants from the original fastq file containing all sequenced reads, the variant is reported at 110541228 and a CAGCC to CAGCT change.

So my question is why this occurs. What is the purpose of adding other bases into the call. And can I control this somehow, and only get the single base that is altered.

chr4 110541232 . C T 6083.73 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=262;CIGAR=1X;DP=262;DPB=262;DPRA=0;EPP=563.283;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;

chr4 110541228 . CAGCC CAGCT 133177 . AB=0;ABP=0;AC=2;AF=1;AN=2;AO=5693;CIGAR=4M1X;DP=5743;DPB=5743;DPRA=0;EPP=12365.2;EPPR=92.0407;GTI=0;LEN=1;MEANALT=5;MQM=60;MQMR=60;

sequencing alignment • 3.4k views
ADD COMMENT
0
Entering edit mode

Freebayes gives you extra local phasing information. At the beginning, HaplotypeCaller reported variants in this way, too, but later the GATK developers thought the conventional way is more convenient to users.

ADD REPLY
1
Entering edit mode
8.0 years ago
User 59 13k

If you only call at a single base, you're only going to get a single base change reported. By default Freebayes reports multinucleotide polymorphisms and 'complex events' which span multiple nucleotides. These reflect cases where you see linked variants, phased by their reads. This is surely a desirable thing to know.

If you have adjacent base changes, that span a single codon, then there are situations where calling these as two separate SNPs from the same read and then trying to deduce their independent effects on the protein sequence is just going to be wrong. In this case you need to count the dinucleotide variant as a linked unit.

If you really must do this, check out vcfallelicprimitives : https://github.com/vcflib/vcflib#vcfallelicprimitives

EDIT: Oops I think I misinterpreted your question. Take a look at : https://github.com/ekg/freebayes/issues/161

ADD COMMENT
0
Entering edit mode

Thanks for the edit. That seems to be the same issue that I am having, although it is unclear what the solution to the problem is.

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6