Biostar Beta. Not for public use.
Question: Filtering Vcf File
10
Entering edit mode

I was wondering how to filter the vcf file based on a few input arguments ( DP>10, MQ>30 and QD>20 or GT = "1/1" etc)? I m planning to use simple command on the command line to extract the info and create a new filtered vcf file. I want to keep the 20 lines of vcf header INFO in new file as well. I can do it with perl but is there any other easy way? Last time I extracted my required info from vcf file using vcftools but I couldnt get a filtered vcf file.

My command

vcftools --vcf GMM_homo.vcf --depth --FILTER-summary --TsTv-by-count --site-mean-depth --SNPdensity 1000 --site-pi --minQ 30 --min-meanDP 5 --out homo_GMM

ADD COMMENTlink 7.5 years ago bioinfo • 700 • updated 2.3 years ago DataFanatic • 130
Entering edit mode
0

I just tried

egrep '^#|"GT =1/1" | "DP>10","MQ>30"' my.vcf > filtered.vcf

Didn't work though.

ADD REPLYlink 7.5 years ago
bioinfo
• 700
• updated 7.5 years ago
Sukhdeep Singh
9.8k
Entering edit mode
0

I need to filter my vcf file to include variants with at least 30 individuals in each of the possible groups: major allele homozygote, heterozygote, and minor allele homozygotes; would be grateful for any input. Thanks!

ADD REPLYlink 2.3 years ago
DataFanatic
• 130
Entering edit mode
0

ask this as a new question please.

ADD REPLYlink 2.3 years ago
Pierre Lindenbaum
120k
32
Entering edit mode

You can do exactly this with vcffilter in vcflib!

Here's how to select all variants with depth greater than 10, mapping quality greater than 30, and QD greater than 20:

vcffilter -f "DP > 10 & MQ > 30 & QD > 20" file.vcf >filtered.vcf

Now, to select only variants with homozygotes, you can strip every genotype that's not homozygous, fix up the file's AC and AF fields using the genotypes with vcffixup, and then remove all the AC = 0 sites (again, using vcffilter).

cat filtered.vcf | vcffilter -g "GT = 1/1" | vcffixup - | vcffilter -f "AC > 0" >results.vcf

The expression language is clunky (you have to put spaces in between the tokens, and parenthetical expressions also have to have spaces). There is also no != symbol, but as a workaround you can do ! ( expression ).

For instance, to pick up non-homozygous genotypes, you'd use:

vcffilter -g "! ( GT = 1/1 )"

I'd like to fix some of these things (and also add regex matching for strings) but this far it more than does the job for quick filtering operations, allowing me to do virtually any kind of filtering from the command line without having to drop into writing a custom script.

These are the supported operations: > < = | & !, and symbols: ( ). Strings are interpreted literally. There is some type checking using the VCF header, so you have to have a valid VCF file. The output is a valid VCF file, so you can stream the filter results into another filtering operation.

ADD COMMENTlink 7.2 years ago Erik Garrison ♦ 2.1k
Entering edit mode
0

Note that this will work for _any_ values in the INFO field or per-sample fields.

ADD REPLYlink 7.2 years ago
Erik Garrison
♦ 2.1k
Entering edit mode
0

Does the vcffilter -f work with mutect vcf output? I tried it but does not seem to work. The vcf output of Mutect has a column as FILTER and I want to only keep the variants that have the value PASS for that column, ideally it should be like this

vcffilter -f "FILTER = PASS" file.vcf > filt_out.vcf

But this does not seem to work. Can anyone tell me where am getting it wrong?

ADD REPLYlink 5.2 years ago
ivivek_ngs
♦ 4.8k
Entering edit mode
0

problem solved, works well with epgrep command.. thanks

ADD REPLYlink 5.2 years ago
ivivek_ngs
♦ 4.8k
Entering edit mode
0

Hi, I have a mutect VCF file with the same FILTER column and PASS value I tried to run the vcffilter command but as you said it does not work. I saw that you solved the problem with grep. Please could you give me more information? Thanks

ADD REPLYlink 3.9 years ago
marcoabbestia
• 0
Entering edit mode
0

I have a problem with vcffilter. When I use it it removes variant info (Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type| ...). Here is my command:

vcffilter -k -f "( TYPE = ins | TYPE = del ) & FDP > 10 & HRUN < 6" -f "QUAL > 20" -g "FAO > 4 & GQ > 5" file.vcf | vcf-annotate --fill-AC-AN | vcffilter -f "AC > 0" > file.vcf.indelfilter.vcf"

Any idea where is the mistake and how to fix it?

ADD REPLYlink 3.4 years ago
siabadaba
• 50
4
Entering edit mode

I wrote some tools to extract the fields from INFO and FORMAT. See: https://code.google.com/p/variationtoolkit/wiki/ExtractInfo and https://code.google.com/p/variationtoolkit/wiki/ExtractFormat

$ cat data.vcf.gz |\
   extractformat -t GT |\
   awk -F '        ' '($11=="1/1") |\
   extractinfo -t DP |\
  awk -F '        ' '(int($12)>10")'
ADD COMMENTlink 7.5 years ago Pierre Lindenbaum 120k
Entering edit mode
1

3.5 years later: this is wrong. Just filter the VCF using https://github.com/lindenb/jvarkit/wiki/VCFFilterJS or extract the fields using gatk varianttotable

ADD REPLYlink 4.1 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

@ Pierre Lindenbaum

can we convert the fpfilter out file which filters output of varscan for false postives to convert into vcf4.0 format? I tried vcf-annotate but to no avail. I was trying to write a script but does not help me out. I would like to know if you can any custom tool designed for it?

ADD REPLYlink 5.3 years ago
ivivek_ngs
♦ 4.8k
3
Entering edit mode

snpSift, a utility associated with snpEff, has several options for filtering and transforming from vcf to tab-delimited text.

ADD COMMENTlink 7.2 years ago Sean Davis 25k
2
Entering edit mode

Don't you just need to add --recode to your command?

ADD COMMENTlink 7.2 years ago Adam • 990

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0