Strelka Indel Allele Counts
3
3
Entering edit mode
11.6 years ago
Richard ▴ 590

Hopefully there are some Strelka users out there who can help with this one. I'm looking for allelic counts of the ref and non-ref allele of the indels.

I see these flags in the VCF header, do these give me the information I need?

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1">
##FORMAT=<ID=DP2,Number=1,Type=Integer,Description="Read depth for tier2">
##FORMAT=<ID=TAR,Number=2,Type=Integer,Description="Reads strongly supporting alternate allele for tiers 1,2">
##FORMAT=<ID=TIR,Number=2,Type=Integer,Description="Reads strongly supporting indel allele for tiers 1,2">
##FORMAT=<ID=TOR,Number=2,Type=Integer,Description="Other reads (weak support or insufficient indel breakpoint overlap) for tiers 1,2">

EDIT: I'm familiar with VCF files, what I am looking for is whether the fields described listed above, or any other fields from the Strelka VCF output will give me the allele counts for ref and non-ref at indels.

somatic indel • 7.1k views
ADD COMMENT
2
Entering edit mode
10.0 years ago

Per the Strelka paper, tier1 counts are simply more stringent than tier2. I'd recommend using tier1 counts... So, as in a normal VCF, DP would be your total depth across the site. In a normal VCF, the AD tag is a comma-delimited list of REF and ALT counts. There is a caveat mentioned at this link, but a workaround is to use the first entry under TIR as your ALT count, and the remainder in DP as your REF count.

ADD COMMENT
1
Entering edit mode
11.6 years ago

You are going to have to look at the actual sample data lines, not just the header. Example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    965600  .       AGCGCG  A       .       QSI_ref IC=0;IHP=2;NT=ref;QSI=1;QSI_NT=1;RC=1;RU=GCGCG;SGT=ref->het;SOMATIC;TQSI=1;TQSI_NT=1    DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50   61:61:35,88:1,1:34,29:65.99:11.98:0.00  58:58:19,82:2,3:45,48:68.35:18.38:0.00

The above is some output from Strelka that shows a sample data line. You're going to look at the VCF specification first, so you know how the header FORMAT tags describe the sample INFO tags. Then you will see that what you want is contained in this string:

DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50

and, for the NORMAL sample, this information is subsequently given:

61:61:35,88:1,1:34,29:65.99:11.98:0.00

I would look at DP50 and FDP50. From the FORMAT string in the header:

##FORMAT=<ID=DP50,Number=1,Type=Float,Description="Average tier1 read depth within 50 bases">
##FORMAT=<ID=FDP50,Number=1,Type=Float,Description="Average tier1 number of basecalls filtered from original read depth within 50 bases">

So, I believe that DP50 is the average read depth calculated within 50 bases from the (each?) end of the indel. FDP50 seems to be either read depth after filtering, or something else entirely, it's not really that clear. Hope this helps.

ADD COMMENT
1
Entering edit mode
11.6 years ago

For convenience here is the complete block of FORMAT tags:

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1">
##FORMAT=<ID=DP2,Number=1,Type=Integer,Description="Read depth for tier2">
##FORMAT=<ID=TAR,Number=2,Type=Integer,Description="Reads strongly supporting alternate allele for tiers 1,2">
##FORMAT=<ID=TIR,Number=2,Type=Integer,Description="Reads strongly supporting indel allele for tiers 1,2">
##FORMAT=<ID=TOR,Number=2,Type=Integer,Description="Other reads (weak support or insufficient indel breakpoint overlap) for tiers 1,2">
##FORMAT=<ID=DP50,Number=1,Type=Float,Description="Average tier1 read depth within 50 bases">
##FORMAT=<ID=FDP50,Number=1,Type=Float,Description="Average tier1 number of basecalls filtered from original read depth within 50 bases">
##FORMAT=<ID=SUBDP50,Number=1,Type=Float,Description="Average number of reads below tier1 mapping quality threshold aligned across sites within 50 bases">
ADD COMMENT

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6