hap.py output variation
0
1
Entering edit mode
5.1 years ago
ofonov ▴ 20

I run hap.py twice on the same vcf file - first time without a stratification file, and second time with a stratification for GC and Low complexity regions. I am puzzled by variation in output of the tool, I get different values in precision and recall, for the same stratification category calculated in hap.py by default (no additional stratification files are needed).

Why do I observe this variation?

Type    Subtype Subset  Filter  Genotype    QQ.Field    QQ  METRIC.Recall   METRIC.Precision
INDEL   I16_PLUS    TS_boundary PASS    *   QUAL    *   0.5 0.857143
INDEL   I16_PLUS    TS_boundary PASS    *   QUAL    *   0.546392    0.928571

Following code was used to run hap.py first time:

 sudo docker run -it \
-v `pwd`:/data \
pkrusche/hap.py \
/opt/hap.py/bin/hap.py \
/data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
/data/SAMPLE.HG001-NA12878.vcf.gz \
-f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
-r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
--verbose \
--logfile /data/out_dir/log.txt \
-o /data/out_dir/SAMPLE

Following code was used to run hap.py second time:

sudo docker run -it \
-v `pwd`:/data \
pkrusche/hap.py \
/opt/hap.py/bin/hap.py \
/data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz \
/data/SAMPLE.HG001-NA12878.vcf.gz \
-f /data/FDAPrecision/GIAB_latest/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed \
-r /data/references/GRCh37_Homo_sapiens/Homo_sapiens.GRCh37.dna.primary_assembly.fa \
--stratification /data/LowComplexity_GC.tsv \
--verbose \
--logfile /data/out_dir/log.txt \
-o /data/out_dir/SAMPLE

Here is a sample of stratification file:

gc15    /data/GA4GH/benchmarking-tools/resources/stratification-bed-files/GCcontent/human_g1k_v37_l100_gc15_slop50.bed.gz
AllRepeats_51to200bp_gt95identity_merged    /data/GA4GH/benchmarking-tools/resources/stratification-bed-files/LowComplexity/AllRepeats_51to200bp_gt95identity_merged.bed.gz
hap.py GA4GH GIAB benchmarking VCF • 1.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1568 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6