Finding somatic and germline variations in tumor samples with matched ones (paired-end, illumina)
1
0
Entering edit mode
5.4 years ago
Raheleh ▴ 260

Hello, I am new to the field of NGS data analysis and currently analyzing WES data from tumor samples with matched ones (paired-end, illumina). I am using linux command to analyze the data. This is what I did till now for each sample:

fastqc sample.fastq
java -jar trimmomatic-0.38.jar PE sample_1.fastq sample_2.fastq -basedout sample LEADING:30 TRAILING:30 MINLEN:50
bowtie2-build hg38.fa hg38
bowtie2 -x hg38 -1 sample_1P -2 sample_2P -S sample.sam
samtools view -bS sample.sam > sample.bam
samtools sort sample.bam -o sample.sorted.bam
samtools mpileup -uf hg38.fa sample.sorted.bam > sample.mpileup

I don’t know after this step what is the reasonable step to take? I am keen on finding somatic and germline variations. I am using varscan, however I am confused. Shall I use “ java -jar VarScan.jar somatic normal.pileup tumor.pileup “? what is different between pileup and mpileup file?

Any help will be very appreciated. Thanks

WES data mpileup file varscan • 1.5k views
ADD COMMENT
1
Entering edit mode
5.4 years ago
ATpoint 81k

A couple of things: First, I would change from bowtie2 to BWA mem because most variant calling pipelines assume BWA as the aligner. Second, you can shorten your commands by using pipes like align (options...) | samtools sort -o sorted.bam -. This will save time and disk space. Third, given that you start a new project, consider to use a more recent variant caller than VarScan2. There is nothing wrong with VarScan2 but it is no longer maintained which is why I personally switched to strelka2 from Illumina recently. If you still want to use VarScan2, you might have a look at my pipeline at Github for it. It is an admittedly ugly script but you can use it to get an idea how the VarScan2 subcommands are to be used. It starts by calling raw variants using mpileup/varscan2 somatic, extracts germline and somatic high confidence variants with processSomatic and then applies the recommended heuristic fpfilter to remove potential junk calls. Still, I encourage you to use a more recent caller like strelka2, which has also has more complete documentation, making the start into the variant field easier for you.

ADD COMMENT
0
Entering edit mode

Dear ATpoint, many thanks for your explanations!

ADD REPLY
0
Entering edit mode

Dear ATpoint,

when I run strelka2 for somatic calling I get this error:

configureStrelkaSomaticWorkflow.py Can't find expected fasta index file: index_bwa/hg38.fa.fai

This is my script: strelka-2.9.2.centos6_x86_64/bin/configureStrelkaSomaticWorkflow.py --normalBam BC.bam --tumorBam XL.bam --ref index_bwa/hg38.fa --runDir demo_somatic

Do you know what the problem is?

ADD REPLY

Login before adding your answer.

Traffic: 2526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6