Biostar Beta. Not for public use.
vcftools does not filter by GQ
0
Entering edit mode
18 months ago
AP • 90

Hello,

I am trying to filter based on GQ < 15. I do the following:

vcftools --vcf infile.vcf --minGQ 15 --recode --out filtered

However, this filtering does not work, nothing is being removed:

After filtering, kept 1287174 out of a possible 1287174 Site

I confirm that the GQ tag is present in my VCF file. Other filters such as min/maxDP or minQ work just fine. I am using VCFtools - v0.1.13

Any thoughts on this would be greatly appreciated.

Thanks!

p.s: This is a cross-post from SEQanswer where I did not receive any answers: http://seqanswers.com/forums/showthread.php?t=69468

ADD COMMENTlink
0
Entering edit mode

what's the definition of GQ in the VCF header ? show us a genotype and its' FORMAT please.

ADD REPLYlink
0
Entering edit mode

Thanks for your answer Pierre. In the VCF header, GQ stands for Genotype Quality. Here is a copy of the header containing the FORMAT fields:

##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

Here is an example of a genotype

GT:PL:DP:SP:GQ  1/1:83,33,0:11:0:40

FYI, the vcf file was generated this way

samtools mpileup -C 50 -E -t SP -t DP -u -I -f genome -b bam_list.txt > out.bcf
bcftools call -v -c -f gq out.bcf > out.vcf
ADD REPLYlink
1
Entering edit mode
18 months ago
AP • 90

Here is an explanation:

GT is just replaced by ./. when GQ is below the threshold. I thought the genotype would simply be completely removed. That is why there is the same number of lines left between none-filtered and filtered files and that GQ information can still be seen, even after filtering.

This is hard to tell though. On the current manual, it says for —minGQ "Exclude all genotypes with a quality below the threshold specified. This option requires that the "GQ" FORMAT tag is specified for all sites”. It doesn’t really say if data is removed or not (like most filtering do).

An older manual version states: "These options are used to exclude genotypes from any analysis being performed by the program. If excluded, these values will be treated as missing. ... Exclude all genotypes with a quality below the threshold specified. This option requires that the "GQ" FORMAT tag is specified for all sites."

So all sites with GQ below the threshold changes the genotype to "./.", without actually removing/filtering out any lines.

ADD COMMENTlink
0
Entering edit mode

Thank you for explaining this AP. I was troubled by the same situation.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1