GATK VariantFiltration multiple filters including genotype for multi-sample VCF
1
0
Entering edit mode
3.9 years ago
graeme.thorn ▴ 100

I have a multisample VCF and I want to filter it based on:

  1. The sequencing depth from the first sample (which is a germline sample) and
  2. The genotype of the first (and/or the third) samples (which are both germline samples)

I've been investigating gatk VariantFiltration for doing this, using -filter='vc.getGenotype("SAMPLE").getDP>=<N>' -filter-name="germline.depth" for the first sample sequencing depth, but I can't find a suitable expression involving the sample genotype that works for the second condition.

Ideally, an expression similar to this: -filter='<something>' -filter-name='something.else' would be used, as it puts the filter into the FILTER column of the VCF, and the unfiltered variants can be selected using gatk SelectVariants to pick those that haven't been marked as filtered.

Is there an expression like the depth one for the individual sample genotypes?

vcf gatk • 1.5k views
ADD COMMENT
0
Entering edit mode

no tested, testing sample2 is HET

vc.getGenotype("SAMPLE1").getDP()>=12345 && vc.getGenotype("SAMPLE2").isHet()
ADD REPLY
0
Entering edit mode

Thanks! A quick test has shown that variations of this (see the answer) are what I required. The GATK help pages don't seem that helpful.

ADD REPLY
0
Entering edit mode
3.9 years ago
graeme.thorn ▴ 100

As per @Pierre Lindenbaum's comment above, the correct filter for the genotype is

vc.getGenotype("SAMPLEn").isHomRef() to select the "0/0" genotype and !vc.getGenotype("SAMPLEn").isHet() && !vc.getGenotype("SAMPLEn").isHomVar() to select the "0/0" or "./." genotypes.

This covered all the use cases I needed.

ADD COMMENT

Login before adding your answer.

Traffic: 1723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6