How to filter a VCF on a set of samples having genotypes containing the minor allele?
1
0
Entering edit mode
7.2 years ago
William ★ 5.3k

Is there a way to filter a VCF (input stream) on variants where at least a specific subset of the samples in the VCF have a heterozygous or homozygous genotype containing the minor allele?

Or the same filter but then for genotypes containing a non-major allele?

I looked at the bcftools and SnpSift filter documentation and I could not find how to do that in one of these tools. Maybe I overlooked the option or combination of options that I should use?

Closest options that I found are in SnpSift are:

isHom( GEN[0] )
isHet( GEN[0] )
isVariant( GEN[0] )
isRef( GEN[0] )

But SnpSift does not have a isMinorAlleleGenotype( GEN[0] ) functionality.

Is there another tool that can do this?

Or am I best of implementing this myself using a VCF library?

vcf bcftools snpsift • 2.8k views
ADD COMMENT
2
Entering edit mode
7.2 years ago

GATK VariantAnnotator https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.php

with

 --resource:foo listOfMajorAleleles.vcf  -resourceAlleleConcordance

and then use VariantFiltration https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_filters_VariantFiltration.php

to keep/excluse those variants that have been flagged in the previous step.

UPDATE:

Using the following script:

var samples=["S1","S2"];

function accept(v)
    {

    var allele2count={};
    var alleles = v.getAlleles();
    for(var i=0;i< alleles.size();++i)
        {
        allele2count[alleles.get(i).getDisplayString()]=0;
        }

    if(allele2count.length==0) return false;

    for(var i=0;i< v.getNSamples();++i)
        {
        var g = v.getGenotype(i);
        if(!g.isCalled()) continue;
        alleles = g.getAlleles();
        for(var j=0;j< alleles.size();++j)
            {
            allele2count[alleles.get(j).getDisplayString()]++;
            }
        }
    var minor=null;
    for(var a in allele2count)
        {
        if(minor == null || allele2count[minor]>allele2count[a])
            {
            minor=a;
            }
        }
    if(minor==null) return false;
    for(var i in samples)
        {
        var g = v.getGenotype(samples[i]);
        if(!g.isCalled()) continue;
        alleles = g.getAlleles();
        for(var j=0;j< alleles.size();++j)
            {
            if(alleles.get(j).getDisplayString().equals(minor)) return true;
            }
        }
    return false;
    }

accept(variant);

and VCFfilterjs: https://github.com/lindenb/jvarkit/wiki/VCFFilterJS

e.g:

 curl -L "https://raw.githubusercontent.com/lindenb/gatk-ui/master/testdata/mutations.vcf" | java -jar  jvarkit-git/dist/vcffilterjs.jar  -f filter.js
ADD COMMENT
0
Entering edit mode

Thank you Pierre but I was hoping to do this filter on the fly, pre-computing the "listOfMajorAleleles.vcf" is not possible in my situation. Also this does not take any specific set of samples in to account.

ADD REPLY
0
Entering edit mode

ah , I see. Please refine what is :" where at least a certain set of samples (in the VCF ?) have a genotype (what kind ?) containing the minor allele? ".

ADD REPLY
1
Entering edit mode

Updated question description, hope it's more clear now.

ADD REPLY

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6