Comparing VCF Files
3
1
Entering edit mode
9.7 years ago
Daniel E Cook ▴ 280

Does anyone have a good way of comparing vcf files? Essentially, we have called variants and we want to compare our set with another that we consider to be 'correct.' We want to adjust our filters so that the concordance between the two sets is maximized.

vcf variant calling • 7.7k views
ADD COMMENT
0
Entering edit mode

Also bedtools will work for simple operations.

ADD REPLY
1
Entering edit mode
9.7 years ago

vcf-compare?

ADD COMMENT
1
Entering edit mode

Unfortunately, this doesn't quite get us where we want. Here is a better question - how does one determine thresholds for depth and quality when filtering variants? Our idea was to compare with another set and see where concordances were maximized. The problem with vcf-compare is it examines the intersection of variants, and not the absolute set.

ADD REPLY
1
Entering edit mode

so you can first filter you vcf and then compare it with another set isn't it ?

ADD REPLY
1
Entering edit mode
9.7 years ago
William ★ 5.3k

GATK has a tool that you can use to compute the precision and recall of a variant call set versus another variant call set (your benchmark / gold standard set). GATK GenotypeConcordance:

SiteConcordance_Summary

  • ALLELES_MATCH: counts of calls at the same site where the alleles match
  • ALLELES_DO_NOT_MATCH: counts of calls at the same location with different alleles, such as the EVAL set calling a 'G' alternate allele, and the comp set calling a 'T' alternate allele
  • EVAL_SUBSET_TRUTH: (multi-alleleic sites only) ALT alleles for EVAL are a subset of ALT alleles for COMP. See also below.
  • EVAL_SUPERSET_TRUTH: (multi-allelic sites only) ALT alleles for COMP are a subset of ALT alleles for EVAL. See also below.
  • EVAL_ONLY: counts of sites present only in the EVAL set, not in the COMP set
  • TRUTH_ONLY: counts of sites present only in the COMP set, not in the EVAL set

http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_GenotypeConcordance.html

You can use bedtools substract to get the variant positions that are in one vcf file and not in another vcf file. This does not take genotypes into account.

http://bedtools.readthedocs.org/en/latest/content/tools/subtract.html

ADD COMMENT
0
Entering edit mode
9.7 years ago
smithtomsean ▴ 220

Is this what you're after?.. GATKs Variant Quality Score Recalibration

From the website:

Introduction

The purpose of variant recalibration is to assign a well-calibrated probability to each variant call in a call set. This enables you to generate highly accurate call sets by filtering based on this single estimate for the accuracy of each call.

ADD COMMENT

Login before adding your answer.

Traffic: 2457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6