Biostar Beta. Not for public use.
Question: How to filter vcf file on minimum genotype depth and quality for each sample
5
Entering edit mode

How can I filter a vcf filter a VCF file on minimum genotype depth and genotype quality for each sample.

I am looking for a way to filter variants from a VCF file by checking that all samples for a site pass 2 critera

sample.DP > 10
sample.GQ > 15

I thought I could do this using bcftools

bcftools view  -i  'MIN(FMT/DP>10) && MIN(FMT/GQ>15)'   my.vcf.gz

Somehow this include expression does not seem to be applied.

In my output are still lots of variants with genotypes likes

GT:AO:DP:GQ:PL:QA:QR:RO
0|0:0:16:3:0,48,502:0:554:16   (=depth 16, quality 3)
1|1:1:1:3:21,3,0:37:0:0        (=depth 1, quality 3)

I tried putting the include commands in two different bcftools commands

bcftools view  -i  'MIN(FMT/DP>10)' | bcftools view -i 'MIN(FMT/GQ>15)'   my.vcf.gz

And I tried without the FMT prefix for the genotype quality

bcftools view -i  'MIN(FMT/DP>10)  &&  MIN(GQ>15)'  my.vcf.gz

Still I get variants back with genotypes that don't match the criteria.

Am I misunderstanding the bcftools include expression documentation?

https://samtools.github.io/bcftools/bcftools.html#expressions

Or is there another way that I can achieve this filtering step?

ADD COMMENTlink 2.9 years ago William ♦ 4.4k
4
Entering edit mode

The problem was wrong place of the parentheses:

Does not work as expected:

bcftools view  -i  'MIN(FMT/DP>10) & MIN(FMT/GQ>15)'   my.vcf.gz

Works as expected:

bcftools view  -i  'MIN(FMT/DP)>10 & MIN(FMT/GQ)>15'   my.vcf.gz
ADD COMMENTlink 2.9 years ago William ♦ 4.4k
2
Entering edit mode

I use vcftools for basic vcf filtering which I find to be very fast and simple to use.

An example would be

vcftools --vcf myfile.vcf --out output_prefix --minGQ 15 --minDP 10 --recode --recode-INFO-all

Manual can be found here https://vcftools.github.io/man_latest.html

ADD COMMENTlink 3.6 years ago artolkamp • 30
Entering edit mode
0

Cools seems to do the job. Thanks.

ADD REPLYlink 3.6 years ago
William
♦ 4.4k
Entering edit mode
0

At closer look it's not exactly what I was looking for. This command sets the genotypes that don't match the criteria to ./. . I am looking to remove the whole variant if not all genotypes pass the criteria. .

ADD REPLYlink 3.6 years ago
William
♦ 4.4k
Entering edit mode
0

Does anyone know if there is a way to do this (set genotypes that don't match the criteria to ./.) in bcftools? I'd like to set low-quality genotypes to missing in a multi-sample vcf but keep the sites that contain them, and can't seem to work out how. The vcftools solution works fine, but I'm curious to know if it can be done in bcftools too.

ADD REPLYlink 13 months ago
J.R.
• 0
Entering edit mode
0

from the above script, --minGQ and --minDP take the value of 15 and 10 respectively for filtering genotype quality and depth. Is there any criteria for choosing 15 and 10.. Just to know if 15 and 10 are standard values to use for those flags.

ADD REPLYlink 13 months ago
mab658
• 30
1
Entering edit mode

Filtering not on the minimum but on the average GQ and sample DP seems to work for me and the result is close enough to what I was looking for.

bcftools view -i 'AVG(GQ)>20 & AVG(FMT/DP)>10'
ADD COMMENTlink 3.6 years ago William ♦ 4.4k
Entering edit mode
0

Hello, Excuse me could you please let me know that the FMT stands for what? Thanks, Omid

ADD REPLYlink 12 months ago
jaafari.omid
• 40
1
Entering edit mode

In case you have a vcfutils.pl in your pipeline, especially in the famous consensus sequence calling one with mpileup, you can use the -d flag to set the minimum coverage. It also should work alone when converting vcf to fq, for a minimum depth coverage of 10 for example:

  • vcfutils.pl vcf2fq -d 10

vcfutils.pl script exists in the bcftools/bin btw

ADD COMMENTlink 3.0 years ago FatihSarigol • 120
Entering edit mode
1

Thanks for your helpful answer.

ADD REPLYlink 11 months ago
jaafari.omid
• 40

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0