Question

How to choose a variant filtering criteria to reduce false positives

3

Entering edit mode

8.1 years ago

nikkihathi ▴ 30

Hello!

First of all, the overview of bioinformatics NGS analysis for exome amplicon sequencing (only 50 genes), see the following point.

mapping, recaliberation and GATK haplotyper calling
For variant annotation : I used Annovar, VEP, SNPEff and Vtools
combined the required annotations in one file (CSV file format), idea is to look for complete annotation including kegg, GO, KAVIAR, clinva , 1000g2015aug, refGene , thousandGenomes, LOF in order to perform knowledge-based functional filtration.

I am very confused about the output, in particular, to understand the discrepancy between the allelic frequency from KAVIAR, thousand Genomes, EUR_MAF. for example, one mutation suggests Kaviar_AF=0.0001153, thousandGenomes_AF_INFO=0.69, EUR_MAF=G:0.9791. How shall we decide the pick of the database and annotations? As I understand that there is no benchmark method to use for annotation but there could be the criteria to make the choice or some statistical method to base our decision on for annotation and filtration.

Is there some discussion regarding the discrepancies found in a different database, and suggested criteria for filtering annotation?

Thanks in advance for any suggestion.

next-gen variant annotation • 2.1k views

ADD COMMENT • link updated 8.0 years ago by chen ★ 2.5k • written 8.1 years ago by nikkihathi ▴ 30

0

Entering edit mode

Variant frequencies are population specific, certainly if the variant is rare.

ADD REPLY • link 8.0 years ago by WouterDeCoster 47k

score 2 · Answer 1 · 2016-05-08

2

Entering edit mode

8.0 years ago

chen ★ 2.5k

Good question.

I usually use confidence + importance to filter variants

You were using GATK, so you were doing germline variant calling, right? That's relative easy and stable.

For somatic mutation calling, it is more tricky.

ADD COMMENT • link 8.0 years ago by chen ★ 2.5k