Question

germline calling of tumor-normal paired sample germline by varscan2

2

Entering edit mode

6.0 years ago

oghzzang ▴ 50

Hello. I'm using varscan2 for germline calling.I have questions about using varscan options.

my data is tumor-normal paired HCC WXS DNA bams.

First, I made mpileup file by using samtools.

samtools mpileup [normal bam] [tumor bam] -B -q 1 > [normal-tumor.mpileup]

Second, I'll run varscan.

java -jar VarScan.jar somatic [normal-tumor.mpileup] [output] --mpileup 1 OPTIONS

varscan2 output have germline calling/somatic calling/ LOH. so I will use germline calling only.

In nature article (Integrated analysis of germline and somatic variants in ovarian cancer, Krishna L. Kanchi et al. 2013 ), they used following varscan options for germline calling of tumor-normal paired samples.

"Germline SNPs and indels were identified in paired BAMs using VarScan2 with the following parameters: min-coverage=30, min-var-freq=0.08, normal-purity=1, P-value=0.10, somatic-P-value=0.001 and validation=1. Additional germline SNPs were identified using Samtools (version 0.1.7a (revision number 599) and additional germline indels were identified using GATK (version 1.0 (revision 5336)."

Varscan2 manual's default P-value is 0.99, but in article they use 0.10.

which options are important and which value is good when I use germline calling with varscan2 somatic?

varscan2 germline tumor-normal • 3.0k views

ADD COMMENT • link updated 5.9 years ago by ATpoint 81k • written 6.0 years ago by oghzzang ▴ 50

score 5 · Accepted Answer · 2018-05-24

I recommend the following command line. It makes a combined tumor-normal mpileup and pipes it into VarScan2, applying the strand-filter, and leaving everything else at default. Be sure to use the latest varscan version.

## 1. Get the raw variants:
samtools mpileup -q 20 -Q 25 -B -d 1000 -f ref.fa normal.bam tumor.bam | \
java -jar VarScan.v2.4.3.jar somatic /dev/stdin outputName -mpileup --strand-filter 1 --output-vcf

## 2. Classify into Germ, LOH and Somatic (showing only code for the SNPs here):
java -jar VarScan.v2.4.3.jar processSomatic snp.vcf --max-normal-freq 0.01

Proceed only with those variants that are classified as high-confidence. Apply the false-positive filter (fpfilter) to the high-confidence germline variants.

## 3.1: Prepare a BED file from the high-confidence germline mutation VCF file to be used with bam-readcount:
egrep -hv "^#" germline_hc.vcf | awk 'OFS="\t" {print $1, $2-1, $2+1}' | sort -k1,1 -k2,2n | bedtools merge -i - > germline_hc.bed

## 3.2: Run bam-readcount:
./bam-readcount -f ref.fa -q 20 -b 25 -d 1000 -l germline_hc.bed -w 1 normal.bam > germline_hc.bamRC

## 3.3: Run fpfilter:
java -jar VarScan.v2.4.3.jar fpfilter germline_hc.vcf germline_hc.bamRC --output-file germline_hc_fpfilterPassed.vcf --filtered-file germline_hc_fpfilterFailed.vcf

Do this for all SNPs, Indels and LOHs and use the resulting variants for your downstream.