Biostar Beta. Not for public use.
Selecting Variants based on annotations in a VCF file
0
Entering edit mode
3.8 years ago
petersam • 0

I have a VCF file generated with GATK and annotated with SNPeff. There is an annotation field described as

> ##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length |  AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO'"

An example of a variant:

AC=5;AF=0.200;AN=10;ANN=C|intron_variant|MODIFIER|GAPDH|4214|transcript|GAPDH_transcript|sim4cc|n.1964-49T>C||||||

I have been using GATK SelectVariants to generate variant lists based on sample genotype and quality metrics but would also like to isolate all variants with different Annotation Impact. In the example above, I would want to get all variants described as "MODIFIER" I am not getting the format correct for the JEXL commands. Can someone suggest a command?

Thank you, S

ADD COMMENTlink
0
Entering edit mode

Just curious--do you want to produce a valid VCF as output, or just find the records of interest?

ADD REPLYlink
3
Entering edit mode
16 months ago
Brice Sarver ♦ 2.6k
United States

The JEXL stuff in the GATK is wonky at best; I avoid it whenever possible. It is much faster to parse your VCF using grep than it is to use SelectVariants in this case. Grep the lines that have MODIFIER and append them to your VCF. You'll probably need to do standard VCF manipulation (sorting, etc.) after the fact. Something like:

grep MODIFIER yourfile.vcf > my_results.vcf

You will, of course, need to modify your regex to avoid the lines with MODIFIER in the header.

ADD COMMENTlink
0
Entering edit mode

Thank you. That was incredibly easy.

ADD REPLYlink
2
Entering edit mode
2.4 years ago
Mitch Bekritsky ♦ 1.1k
London, England

bcftools view is your friend here. I'd suggest reading the documentation on the linked page for more details, but something along the lines of

bcftools view -i "ANN~MODIFIER" -o modifier.vcf -Ov init.vcf

should get you a VCF file with the variants you're looking for.

ADD COMMENTlink
1
Entering edit mode
16 months ago
France/Nantes/Institut du Thorax - INSE…

using vcffilterjs and the following script: https://github.com/lindenb/jvarkit/wiki/VCFFilterJS

function accept(v) {
var ann =   v.getAttributeAsList("ANN");
for(var i in ann)
    {
    var tokens = ann[i].split(/\|/);
    if( tokens.length>2 && tokens[2] == "MODIFIER" ) return true;
    }
return false;
}
accept(variant);
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1