varscan somaticFilter thinks all my variants are near INDELS so removes all of them
0
1
Entering edit mode
7.2 years ago
wes3985 ▴ 10

Dear All,

I am trying to filter false positive variant calls using varscan somaticFilter however it thinks all of my variants are near to INDELS and removes all of them. I have done GATK re-alignment around INDELS (twice now) but it has not resolved the problem. Here is my variant calling code:

samtools mpileup -f $BAMS/hg38.fa -q 1 $BAMS/$gBAM$ext1 $BAMS/$fcBAM$ext1 | \
varscan somatic -mpileup $fcBAM --min-coverage-normal 8 \
--min-coverage-tumor 8 --p-value 0.05 --min-var-freq 0.02 --strand-filter 1 --output-vcf 1

and here is my code to filter false positive variants:

varscan somaticFilter $varscan_somatic/$i$ext1 --min-coverage 8 --min-reads2 2  --min-strands2 2 --min-var-freq 0.02 \
--indel-file $varscan_somatic/$i$ext1 --output-file $varscan_somatic/$i$filtered

here is the output of varscan somatic:

Window size:    10
Window SNPs:    3
Indel margin:   3
Reading input from /home/rmhawwo/Scratch/varscan_somatic/fc25.snp.vcf
13955 cluster SNPs identified
Reading input from /home/rmhawwo/Scratch/varscan_somatic/fc25.snp.vcf
43758 variants in input stream
1066 failed to meet coverage requirement
260 failed to meet reads2 requirement
38 failed to meet varfreq requirement
5135 failed to meet p-value requirement
4847 in SNP clusters were removed
32412 were removed near indels
0 passed filters

Here is the first few lines of snp calls:

   #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    13273   .       G       C       .       PASS    DP=158;SS=1;SSC=0;GPV=4.7946E-28;SPV=9.8322E-1  GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:71:30:39:56.52%:22,8,37,2 0/1:.:87:51:35:40.7%:44,7,30,5
chr1    14610   .       T       C       .       PASS    DP=45;SOMATIC;SS=2;SSC=3;GPV=1E0;SPV=4.101E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:16:16:0:0%:16,0,0,0       0/1:.:29:27:2:6.9%:27,0,2,0
chr1    14653   .       C       T       .       PASS    DP=131;SS=1;SSC=4;GPV=8.6909E-13;SPV=3.7801E-1  GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:37:28:9:24.32%:27,1,9,0   0/1:.:94:66:27:29.03%:63,3,27,0
chr1    14776   .       G       A       .       PASS    DP=192;SS=1;SSC=6;GPV=2.2557E-5;SPV=2.2983E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:63:55:3:5.17%:0,55,0,3    0/1:.:129:111:12:9.76%:0,111,0,12
chr1    14798   .       C       G       .       PASS    DP=181;SS=1;SSC=4;GPV=9.7008E-5;SPV=3.7571E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:57:52:3:5.45%:1,51,0,3    0/1:.:124:111:10:8.26%:1,110,0,10
chr1    16487   .       T       C       .       PASS    DP=52;SS=1;SSC=0;GPV=6.2398E-3;SPV=7.9924E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:18:15:3:16.67%:15,0,3,0   0/1:.:34:28:4:12.5%:28,0,3,1
chr1    16495   .       G       C       .       PASS    DP=51;SS=1;SSC=0;GPV=1.1021E-9;SPV=8.6543E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:18:7:10:58.82%:7,0,10,0   0/1:.:33:16:14:46.67%:15,1,14,0
chr1    17538   .       C       A       .       PASS    DP=92;SS=1;SSC=4;GPV=3.4872E-4;SPV=3.8557E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:48:42:5:10.64%:38,4,3,2   0/1:.:44:34:6:15%:15,19,2,4
chr1    65797   .       T       C       .       PASS    DP=23;SOMATIC;SS=2;SSC=12;GPV=1E0;SPV=5.4545E-2 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:13:13:0:0%:13,0,0,0       0/1:.:10:6:3:33.33%:6,0,3,0
chr1    65872   .       T       G       .       PASS    DP=68;SS=1;SSC=1;GPV=2.8856E-2;SPV=7.4755E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:37:33:3:8.33%:33,0,3,0    0/1:.:31:27:2:6.9%:27,0,2,0
chr1    69270   .       A       G       .       PASS    DP=52;SS=1;SSC=0;GPV=6.1512E-28;SPV=1E0 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:28:0:25:100%:0,0,25,0     1/1:.:24:0:22:100%:0,0,22,0

And here are the first few lines of indel calls:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    13417   .       C       CGAGA   .       PASS    DP=173;SS=1;SSC=0;GPV=2.0462E-24;SPV=8.5199E-1  GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:62:35:27:43.55%:0,35,0,27 0/1:.:111:69:40:36.7%:0,69,0,40
chr1    15903   .       G       GC      .       PASS    DP=53;SS=1;SSC=15;GPV=1E0;SPV=3.1275E-2 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:35:8:27:77.14%:6,2,15,12  1/1:.:18:0:17:100%:0,0,1,16
chr1    129010  .       AATG    A       .       PASS    DP=46;SS=1;SSC=4;GPV=1.3026E-2;SPV=3.3202E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:23:20:2:9.09%:20,0,2,0    0/1:.:23:18:4:18.18%:18,0,4,0
chr1    129148  .       G       GT      .       PASS    DP=239;SS=1;SSC=3;GPV=3.6603E-3;SPV=4.2833E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:114:100:3:2.91%:50,50,0,3 0/1:.:125:111:5:4.31%:45,66,4,1
chr1    186111  .       CCAAA   C       .       PASS    DP=37;SOMATIC;SS=2;SSC=7;GPV=1E0;SPV=1.982E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:15:15:0:0%:14,1,0,0       0/1:.:22:19:3:13.64%:19,0,3,0
chr1    188025  .       CT      C       .       PASS    DP=29;SS=3;SSC=14;GPV=1E0;SPV=3.2841E-2 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:10:7:3:30%:7,0,3,0        0/0:.:19:19:0:0%:17,2,0,0
chr1    189392  .       ACC     A       .       PASS    DP=141;SS=1;SSC=2;GPV=8.2532E-25;SPV=5.082E-1   GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:74:37:34:47.89%:37,0,34,0 0/1:.:67:32:31:49.21%:32,0,31,0
chr1    189713  .       GC      G       .       PASS    DP=45;SS=1;SSC=0;GPV=2.764E-2;SPV=8.4207E-1     GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:22:18:3:14.29%:0,18,0,3   0/1:.:23:20:2:9.09%:0,20,0,2
chr1    727679  .       T       TG      .       PASS    DP=114;SOMATIC;SS=2;SSC=3;GPV=1E0;SPV=4.0132E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:40:35:0:0%:31,4,0,0       0/1:.:74:59:2:3.28%:39,20,1,1
chr1    939436  .       C       CT      .       PASS    DP=48;SS=1;SSC=1;GPV=7.5306E-6;SPV=7.9348E-1    GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:13:8:5:38.46%:0,8,0,5     0/1:.:35:22:10:31.25%:1,21,1,9
chr1    956333  .       TG      T       .       PASS    DP=33;SOMATIC;SS=2;SSC=5;GPV=1E0;SPV=2.5862E-1  GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:14:14:0:0%:0,14,0,0       0/1:.:19:13:2:13.33%:0,13,0,2

The snps are in most cases at least a few hundred bases away from INDELS and this is just the start of the list. I am wondering why it would filter this as I have read that snps are only removed from indels if they are within a few bases? Has anyone come across this before or have a solution to it? I have tried several call files but still encounter this problem with all of them. Perhaps there is something simple that I missed? Many thanks

varscan somaticFilter false-positive filtering • 2.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 1097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6