Deletions are missed from consensus sequence even when they are predominant. Is this common to vcf2fq of vcfutils?
0
0
Entering edit mode
3.1 years ago
malaya77 • 0

After alignment of reads and conversion to BAM I can visualize the existence of a 9-base deletion.

This deletion region is also called correctly by mpileup and bcftools

bcftools mpileup -Ou -f $ref xxx.bam -o newbcfMPILE_xxx
bcftools call newbcfMPILE_xxx --ploidy 1 -mv -Ov -o newbcfMPILE_xxx_haploid.vcf
bcftools call newbcfMPILE_xxx --ploidy 1 -c -Ov | vcfutils vcf2fq > cns_xxx.fq

In consensus sequence this portion is:

ctagtttgtctAgtttGaagcta  <--consensus from vcf2fq
ctagtttg---------aagcta  <--Expect this because reads with deletions is predominant
...........A....G......  <--mutations in other reads without deletion, which fill in the gaps in the consensus

ctagtttgtctGgtttTaagcta  <--REF

In the vcf file I do see these indel mutations with more mutant reads than non-indels.

#CHROM      POS     REF        ALT  QUAL    INFO
SARSCOV2    11287   GTCTGGTTTT  G   228.344 DP=224; DP4=27,1,167,29;MQ=54
SARSCOV2    11288   TCTGGTTTTA  T   228.325 DP=205; DP4=15,4,159,27;MQ=54

167+29 = 196 reads out of 224 total show the deletion. Other deletion overlaps except one base at either end, with a similar dominant proportion.

Visualization of the region of BAM file in Tablet. The asterisks are deletions.

Is there a way I can generate consensus with the deleted portion removed (or filled in with ---------) instead of nucleotides from the minority reads?

I have an alternative python script to count bases at each position and call consensus, but it does not have the statistics as samtools/bcftools/vcfutils.

vcf2fq consensus bcftools indel • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6