GATK VariantFiltration does not filter [SOLVED]
3
0
Entering edit mode
3.6 years ago

Hi everyone,

I'm trying to use GATK to hard filter variants but, while I'm following the GATK website's tutorial, I haven't actually been able to filter out any variants. I want to filter based on the VQSLOD flag.

I tried to perform the filtering using GATK 3.8 or 4.1, but systematically, any variants is filtered. I have no error output.

GATK 3.8 version:

java -jar /home/maintenance-gg/Téléchargements/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar \
-T VariantFiltration \
-R /home/maintenance-gg/Documents/Reference_genome/Pfalciparum.genome.fasta \
--filterName LowQualVQ -filter "VQSLOD <= 0.0" \
--variant /home/maintenance-gg/Documents/VCF2/SNPs.vcf \
-log /home/maintenance-gg/Documents/VCF2/filtration.txt \
-o /home/maintenance-gg/Documents/VCF2/SNP_filtered5.vcf

GATK 4.1 version:

gatk VariantFiltration \
-R /home/maintenance-gg/Documents/Reference_genome/Pfalciparum.genome.fasta \
-V /home/maintenance-gg/Documents/VCF2/calling_GVCF.vcf \
--filter-name LowQualVQ -filter "VQSLOD <= 0.0" \
-O /home/maintenance-gg/Documents/VCF2/SNP_filtered5.vcf

Can anyone help me out? I have search for a solution on biostars and GATK support, but I don't found a solution to my problem... I just know that GATK's filter expressions couldn't take integers, and they needed doubles.

Here are the INFO line of VQSLOD and an example SNP line of my VCF before filtration.

INFO:

##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds of being a true variant versus being false under the trained gaussian mixture model">

SNP example:

Pf3D7_01_v3 176 .   G   A   107.14  PASS    AC=2;AF=1.00;AN=2;DP=5;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=34.35;QD=26.79;SOR=3.258;VQSLOD=3.39;culprit=SOR GT:AD:DP:GQ:PL  1/1:0,4:4:12:121,12,0   ./.:0,0 ./.:6,0:6   ./.:0,0 ./.:5,0:5   ./.:0,0 ./.:0,0 ./.:3,0:3   ./.:20,0:20 ./.:0,0 ./.:8,0:8   ./.:0,0 ./.:5,0:5   ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0 ./.:0,0

Please let me know if you need any additional information.

Thanks!

GATK VariantFiltration genome Filtering SNP • 3.4k views
ADD COMMENT
0
Entering edit mode
3.6 years ago

Thank you for your reply ;)

Harold, the PASS was due to another filter.

However, I understood my problem. The annotation was correctly made with VariantFiltration, but the next step was to apply the filtering (and I did not perform it).

For this, I used the following GATK command :

gatk SelectVariants \
-R /home/maintenance-gg/Documents/Reference_genome/Pfalciparum.genome.fasta \
-V /home/maintenance-gg/Documents/VCF2/SNP_filtered.vcf \
-O /home/maintenance-gg/Documents/VCF2/SNP_filtered2.vcf \
-select 'vc.isNotFiltered()'

The -select flag allows to conserve only variants that passed my criteria.

Thank a lot for your help guys ;)

Best,

ADD COMMENT
0
Entering edit mode
3.6 years ago

isn't it the reverse logic ? The variant will be flagged as LowQualVQ if "VQSLOD <= 0.0" ? can you please try with "VQSLOD > 0.0"

ADD COMMENT
0
Entering edit mode

Hi, thanks for your reply. The name of my filter is wrong, you're right; but even if I execute the command as you propose, there is no difference, any variants were filtered.

ADD REPLY
0
Entering edit mode
3.6 years ago

From the command description:

"A filtered VCF in which passing variants are annotated as PASS and failing variants are annotated with the name(s) of the filter(s) they failed."

In the SNP example you posted, it's annotated as 'PASS' - as it should be.

ADD COMMENT

Login before adding your answer.

Traffic: 1306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6