Biostar Beta. Not for public use.
How can i increase variants accuracy?
1
Entering edit mode
18 months ago
atiqueulalam • 10

I want to increase accuracy in the variants using multiple variants calling tools such as Varscan, GATK, Samtools(vcftools),BreakDancer. Is there any pipeline which provide output by combining the results of each individual tools?

ADD COMMENTlink
0
Entering edit mode

Hello atiqueulalam ,

each variant caller has its strength and weakness. If you create a a pipeline, where you said "Only a variant that is found by x variant callers, is a true variant", the price will be sensitivity. You will lost a bunch of true variants then.

fin swimmer

ADD REPLYlink
1
Entering edit mode
11 months ago
Republic of Ireland

From my experience in the clinical genetics scene in the UK, sampling reads at 'random' from your BAM file (picard DownsampleSam)and then calling variants on each 'sub BAM' with samtools mpileup piped into bcftools call (and then obtaining a consensus listing of all variants) is enough to find all true-positives that can possibly be found. On many occasions, GATK and other tools will 'miss' variants, for whatever reasons. It is erroneous to believe that simply running multiple tools on the same sample, or repeating the same sample in the lab, is enough.

A loose benchmark: 1000 Genomes variants were identified by merging the calls from multiple variant callers. However, using the method above, it was very easy to find all variants in 1000 Genomes that had already been found (and there was even evidence that the consortium had missed variants that should have been reported).

ADD COMMENTlink
0
Entering edit mode

Could you comment on the principle behind the downsampling strategy? How does it increase the confidence?

ADD REPLYlink
0
Entering edit mode

Virtually all variant callers will only look at a certain number of reads for the purpose of variant calling, and ignore all other [reads]. Other callers may do this and / or also apply a posterior probability of a variant being present or not.

By 'splitting' the BAM file into multiple BAM files of randomly-selected reads, the odds are shifted in favour of detecting a variant in at least one of the sub BAMs. It is like 'shuffling' the deck of cards.

ADD REPLYlink
0
Entering edit mode

Hi Kevin- I'm also interested and puzzled by the idea of subsampling

the odds are shifted in favour of detecting a variant in at least one of the sub BAMs.

Sure, but this should come at the expense of increasing false positives, isn't it? Even more extreme, one would call a variant wherever there is a read mismatching with the reference.

ADD REPLYlink
0
Entering edit mode

Hey, that's a good point, dariober. However, we controlled for this by never making a variant call below a read-depth of 18 (in any sub-BAM). Again, we had data to show that 18 was the absolute bare minimum at which anyone should be calling a variant. If a region of interest fell below 18, we had to send that region for Sanger seq.

Usually, targeted panels achieve 100x depth of coverage, so, even sub-sampling reads to 25%, most bases will always be > 18 read-depth.

This is all for germline variants, of course.

ADD REPLYlink
0
Entering edit mode

My read's depth coverage 30X, can I use this technique?

ADD REPLYlink
1
Entering edit mode

You may consider just downsampling to at minimum 60% random reads, in that case (60% of 30 is 18). Keep in mind that there is no publication for this - I never published the methodology behind it, yet.

ADD REPLYlink
0
Entering edit mode

Ok, Thank you for your reply

ADD REPLYlink
1
Entering edit mode
6 weeks ago
ATpoint 17k
Germany

If you have unmatched samples, you might be interested in appreci8, an approach that combines 8 variant callers and introduces a custom filtering strategy that was trained on an extensive number of datasets to improve sensitivity.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1