Filtering IonTorrent variant caller VCFs
1
0
Entering edit mode
5.2 years ago
graeme.thorn ▴ 100

I've got c. 140 VCF files generated by the IonTorrent variant caller pipeline (sequenced using ampliSeq on the comprehensive cancer panel) that I want to process further. However, I'm not sure what filters to apply to the VCFs before grouping them into merged VCFs per group.

As far as I can tell, there are established GATK filtering thresholds, and thresholds for samtools mpileup called variants, but their INFO and FORMAT fields in their vcfs do not match up exactly with those provided by variants called from IonTorrent.

Is it ok to try and translate the GATK thresholds into those used by IonTorrent (as I say there's no exact match between some of the thresholds and values provided by IonTorrent) or is there a known set of hard thresholds somewhere that I could use as a starting point for my filtering?

iontorrent variants vcf • 1.8k views
ADD COMMENT
0
Entering edit mode

Hey Graeme, which tags are present in the FORMAT and INFO fields? Usually one would filter by position read depth and QUAL score. If certain metrics are present, one can also filter by strand and read position bias, allelic fraction, etc.

ADD REPLY
0
Entering edit mode

I've got QUAL and AF in both FORMAT and INFO fields. I suspect that damaged DNA has low allelic fraction (there'll be lots of reference reads and not many variant reads) so a filter on the allelic fraction and on quality will remove most of the damaged DNA from the sample. Other filters I applied were based on those for GATK, but obviously transposed into the IonTorrent metadata language.

The raw data has a Ti/Tv ratio of >16, but filtering it as I did has given a value closer to 2.8, which, given the data is likely exon-heavy (it's not a WGS or WXS run, but was specifically amplified using primers for certain genes) is much better. Also, the distribution in allelic fraction and quality for C>T/G>A transitions once filtered is much closer to that for the opposite transitions T>C/A>G suggesting the filters I ran were stringent enough.

ADD REPLY
1
Entering edit mode
5.1 years ago
graeme.thorn ▴ 100

For future reference, what I did was to filter the data using the GATK guidelines on strand bias and read depth and on allelic fraction and quality. DNA damaged during the fixing process is likely randomly distributed, so will have low allelic fraction and low quality as estimated by the variant caller.

Filtering on quality and allelic fraction (along with everything else) proved stringent enough: the C>T/G>A transitions are now about as frequent as the T>C/A>G transitions in the filtered dataset, and the overall transition/transversion ratio has decreased from above 16 (in the unfiltered database) down to about 2.8.

ADD COMMENT

Login before adding your answer.

Traffic: 1868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6