Bcftools stats error: Incorrect allele ("1") at ...
0
0
Entering edit mode
7.0 years ago
msf • 0

Hello,

I've been running into an error lately when I try to run bcftools stats but that I believe is probably an error produced in bcftools call.

I have a set of .bcf files generated with samtools mpileup. Original bams were mapped with BWA-mem, treated with AddOrReplaceReadGroups, MarkDuplicates and RealignerTargetCreator.

This is my call for the files; I run them inside a lop

samtools mpileup -Q 20 -q 20 -go %s_raw.bcf -f %s %s

where the second %s is a path to the reference and the third the name of the input bam.

Then I run bcftools call, again inside a loop:

cat chromosome.list | xargs -I {} -n 1 -P 20 sh -c 'bcftools call -cO z -r {} --format-fields GQ -o %s/%s.{}.vcf.gz %s'

where the call would look something like this:

cat chromosome.list | xargs -I {} -n 1 -P 20 sh -c 'bcftools call -cO z -r {} --format-fields GQ -o LER_ALM_1580_raw/LER_ALM_1580_raw.{}.vcf.gz LER_ALM_1580_raw.bcf'

then I concatenate all the chromosome files using bcftools concat.

Finally, I try to run bcftools stats, like so (example from one sample):

bcftools stats -F /scratch/3/msf/Reference/LER_ALM1580/LER_ALM1580.gatk.iteration4.consensus.FINAL.fa -s - LER_ALM_1580_raw.vcf.gz > LER_ALM_1580_raw_v2.vcf.gz.stats

And get this error:

[E::bcf_calc_ac] Incorrect allele ("1") in LER_ALM_1580 at 1:461

If I look it up with bcftools view:

bcftools view -r 1:461 LER_ALM_1580_raw.vcf.gz

This is what I see:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LER_ALM_1580

1 461 . C . 3.53256 . DP=1;SGB=-0.379885;MQ0F=0;AF1=1;AC1=2;DP4=0,0,0,1;MQ=60;FQ=-29.9956 GT:PL:GQ 0/1:29:5

Which means that there is a discordance between the GT field and REF/ALT information.

I have gone back and tried different things with only one sample and one chromosome. I've realized that if I repeat the analysis without using format-fields, the output vcf file is different (REF/ALT and GT are concordant!):

bcftools call -cO z -r 1 -o LER_ALM_1580_raw_chr1.vcf.gz LER_ALM_1580_raw.bcf
bcftools index LER_ALM_1580_raw_chr1.vcf.gz
bcftools stats -F /scratch/3/msf/SpeciesForReference/LER_ALM1580/LER_ALM1580.gatk.iteration4.consensus.FINAL.fa -s - LER_ALM_1580_raw_chr1.vcf.gz > LER_ALM_1580_raw_chr1.vcf.gz.stats
bcftools view -r 1:461 LER_ALM_1580_raw_chr1.vcf.gz

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LER_ALM_1580 1 461 . C . 3.53256 . DP=1;SGB=-0.379885;MQ0F=0;AF1=1;AC1=2;DP4=0,0,0,1;MQ=60;FQ=-29.9956 GT:PL 0/0:29

But if I use --format-fields, then the same as before occurs.

Any idea why this happens? Any advice on how to proceed?

Thank you

bcftools variant call SNP vcf genotype quality • 2.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6