How can i increase variants accuracy?
2
1
Entering edit mode
5.5 years ago
atiqueulalam ▴ 10

I want to increase accuracy in the variants using multiple variants calling tools such as Varscan, GATK, Samtools(vcftools),BreakDancer. Is there any pipeline which provide output by combining the results of each individual tools?

snp assembly alignment Variants • 1.3k views
ADD COMMENT
0
Entering edit mode

Hello atiqueulalam ,

each variant caller has its strength and weakness. If you create a a pipeline, where you said "Only a variant that is found by x variant callers, is a true variant", the price will be sensitivity. You will lost a bunch of true variants then.

fin swimmer

ADD REPLY
1
Entering edit mode
5.5 years ago

From my experience in the clinical genetics scene in the UK, sampling reads at 'random' from your BAM file (picard DownsampleSam)and then calling variants on each 'sub BAM' with samtools mpileup piped into bcftools call (and then obtaining a consensus listing of all variants) is enough to find all true-positives that can possibly be found. On many occasions, GATK and other tools will 'miss' variants, for whatever reasons. It is erroneous to believe that simply running multiple tools on the same sample, or repeating the same sample in the lab, is enough.

A loose benchmark: 1000 Genomes variants were identified by merging the calls from multiple variant callers. However, using the method above, it was very easy to find all variants in 1000 Genomes that had already been found (and there was even evidence that the consortium had missed variants that should have been reported).

ADD COMMENT
0
Entering edit mode

Could you comment on the principle behind the downsampling strategy? How does it increase the confidence?

ADD REPLY
0
Entering edit mode

Virtually all variant callers will only look at a certain number of reads for the purpose of variant calling, and ignore all other [reads]. Other callers may do this and / or also apply a posterior probability of a variant being present or not.

By 'splitting' the BAM file into multiple BAM files of randomly-selected reads, the odds are shifted in favour of detecting a variant in at least one of the sub BAMs. It is like 'shuffling' the deck of cards.

ADD REPLY
0
Entering edit mode

Hi Kevin- I'm also interested and puzzled by the idea of subsampling

the odds are shifted in favour of detecting a variant in at least one of the sub BAMs.

Sure, but this should come at the expense of increasing false positives, isn't it? Even more extreme, one would call a variant wherever there is a read mismatching with the reference.

ADD REPLY
0
Entering edit mode

Hey, that's a good point, dariober. However, we controlled for this by never making a variant call below a read-depth of 18 (in any sub-BAM). Again, we had data to show that 18 was the absolute bare minimum at which anyone should be calling a variant. If a region of interest fell below 18, we had to send that region for Sanger seq.

Usually, targeted panels achieve 100x depth of coverage, so, even sub-sampling reads to 25%, most bases will always be > 18 read-depth.

This is all for germline variants, of course.

ADD REPLY
0
Entering edit mode

My read's depth coverage 30X, can I use this technique?

ADD REPLY
1
Entering edit mode

You may consider just downsampling to at minimum 60% random reads, in that case (60% of 30 is 18). Keep in mind that there is no publication for this - I never published the methodology behind it, yet.

ADD REPLY
0
Entering edit mode

Ok, Thank you for your reply

ADD REPLY
1
Entering edit mode
5.5 years ago
ATpoint 82k

If you have unmatched samples, you might be interested in appreci8, an approach that combines 8 variant callers and introduces a custom filtering strategy that was trained on an extensive number of datasets to improve sensitivity.

ADD COMMENT

Login before adding your answer.

Traffic: 3431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6