Dear developer.
I am working on measuring the sv calling performance of vg software using nanopore data (80× coverage) and Illumina Platinum Genome data (50× coverage) in rice. It was confirmed that insertion recall in vg was very low score. But, sample test data from vg software is high score. I would be grateful if you could tell me the reason of low recall score.
I used rice reference file and PAV information (from sample1) for constructing graph for vg.
Is there any way to improve it?
Can you peruse the command below?
Use [vg: variation graph tool, version v1.25.0 "Apice"]
Use [toil-vg: version 1.6.2a1]
for i in $(seq 1 12);do vg construct -r ref.fa -v Chr$i.vcf.gz -S -R Chr$i -C -p -f -a -t 48 > Chr$i.vg;done
vg ids -j for i in $(for i in $(seq 1 12); do echo Chr$i.vg ;done)
vg index -t 48 -x all.xg $(for i in $(seq 1 12); do echo Chr$i.vg ;done)
for i in $(seq 1 12);do vg prune -r Chr$i.vg -t 48 > Chr$i.pruned.vg
vg index -g all.gcsa $(for i in $(seq 1 12); do echo Chr$i.pruned.vg; done)
vg map -x all.xg -g all.gcsa -f sample1_1.fastq.gz -f sample1_2.fastq.gz >sample1.aln.gam
vg pack -x all.xg -g sample1.aln.gam -Q 20 -t 48 -o sample1.pack
vg pack all.xg -k sample1.pack -s sample1 -t 48 >sample1.vcf
toilvg vcfeval ./jobStore . --vcfeval_baseline truth.vcf.gz --call_vcf sample1.vcf.gz --sveval --vcfeval_sample sample1 --realTimeLogging --realTimeStderr --min_sv_len 50 --ins_max_gap 1000
Result
Recall_INS=0.5085
Recall_DEL=0.9496
Thanks for your help.
Best wishes,