VCFs appear inconsistent with view of alignment
0
0
Entering edit mode
3.6 years ago
mark.rose ▴ 50

Hi All

I've aligned pacbio ccs reads to a small reference sequence using minimap2 and viewed the result in IGV. The alignments look good, with generally very few discrepancies with the reference. There is one region, however where more variations are observed and that is at a string of 13 Cs. This is not unexpected but when viewing it in IGV I'm actually impressed by how minimal it appears to be.

enter image description here

Note that the reads aligned in this snapshot are typical of the rest of the alignment as well and that the center line marks the beginning of the polyC string

Subsequently I attempted to call variants on this bam file using a variety of variant caller, callvariants (BBMAP), freebayes, and HaplotypeCaller (gatk). All call a 1bp deletion of a C at the poly C string.

callvariants

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MIR604-F2-M2-1A.ccs.10pct.fastq-vs-MIR604.5901-10938.fa.minimap2.sort
MIR604.5901-10938       1401    .       Tc      T       23.72   PASS    SN=0;STA=1401;STO=1402;TYP=DEL;R1P=340;R1M=281;R2P=0;R2M=0;AD=621;DP=1000;MCOV=-1;PPC=0;AF=0.6210;RAF=0.6210;LS=3013780;MQS=37260;MQM=60;BQS=21508;BQM=41;EDS=960160;EDM=4471;IDS=619759;IDM=999;NVC=0;FLG=0;CED=601;HMP=6;SB=0.9997  GT:DP:AD:AF:RAF:NVC:FLG:SB:SC:PF        1:1000:621:0.6210:0.6210:0:0:0.9997:23.72:PASS


freebayes

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  unknown
MIR604.5901-10938       1401    .       TC      T       20562.3 .       AB=0.578755;ABP=61.8389;AC=2;AF=0.666667;AN=3;AO=632;CIGAR=1M1D;DP=1092;DPRA=0;EPP=10.9266;EPPR=20.4059;GTI=0;HWE=-0;LEN=1;MEANALT=20;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=371.527;PAIRED=0;PAIREDR=0;RO=364;RPP=626.539;RPPR=393.971;RUN=1;SAP=16.2178;SRP=20.4059;TYPE=del;XAI=0.00135563;XAM=0.00142991;XAS=7.42841e-05;XRI=0.00134239;XRM=0.00140067;XRS=5.82842e-05;BVAR  GT:DP:RO:QR:AO:QA       0/1/1:1092:364:32027:632:23740

HaplotypeCaller

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MIR604-F2-M2-1A
MIR604.5901-10938       1401    .       TC      T       2899.01 .       AC=1;AF=1.00;AN=1;BaseQRankSum=0.922;DP=342;FS=5.456;MLEAC=1;MLEAF=1.00;MQ=60.00;MQRankSum=0.000;QD=8.48;ReadPosRankSum=-0.229;SOR=0.463      GT:AD:DP:GQ:PL  1:119,223:342:99:2909,0

All three variant callers call a 1 bp deletion at the homopolymer. What is curious though is the apparent frequency of the alternate allele each of these tools perceive compared to what is visible in IGV. IGV indicates that variants at this position are relatively rare (1-2%) where as each of these callers is reporting the variant is observed in the majority of reads (~60%). Any ideas why?

Thanks

vcf gatk freebayes bbmap igv • 960 views
ADD COMMENT
0
Entering edit mode

Please use the code button (101010) to highlight code and data examples and make sure you use the image button to embed images. You have to post the full image link including the suffix, here that is https://i.ibb.co/DpTp5Vr/poly-C-variants.png which needs to be pasted into the image button field. I did it for you this time. Thanks!

ADD REPLY
0
Entering edit mode

Actually I tried doing both but they never appeared correctly in the preview window so I must be doing something wrong

ADD REPLY

Login before adding your answer.

Traffic: 2272 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6