Varscan variant calll error on pacbio data
0
0
Entering edit mode
7.9 years ago
mark.rose ▴ 50

Hello

I'm trying to call variants on an alignment of pacbio data to a reference thusly:

blasr sample.bas.h5 ref.fa -sam  -clipping soft -nproc 4 -minSubreadLength 9500 > sample.sam

I then sampled this output for uniquely aligning reads and converted into a bam file

Then I performed mpileup (samtools version 0.1.19-44428cd)

samtools mpileup -BAQ0 -d 1000 -f ref.fa sub_sample.sort.bam > sub_sample.sort.bam.BAQ0d1000.mpileup

And finally, varscan (v2.3.9)

 java -jar ~/BioInfo/bin/VarScan.v2.3.9.jar mpileup2snp sub_sample.sort.bam.BAQ0d1000.mpileup  --min-var-freq 0.2 --min-avg-qual 0 --output-vcf 1 >sub_sample.sort.bam.BAQ0d1000.mpileup.snp.vcf

This reports: 4 variant positions (1 SNP, 3 indel) 3 were failed by the strand-filter 1 variant positions reported (1 SNP, 0 indel)

And sub_sample.sort.bam.BAQ0d1000.mpileup.snp.vcf shows:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1
ref      12869   .       C       G       .       PASS    ADP=148;WT=0;HET=1;HOM=0;NC=0   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:255:148:148:219:270:26.68%:7.7825E-105:10:10:219:0:269:1

sub_sample.sort.bam.BAQ0d1000.mpileup.snp.vcf for this position has:

ref      12869   c       148     ...............-1T......+1G.-1T.+1T,,,.,..*..,.+1T,,..,,..*.,g,.,.+1T,..,,+1g,.,*,+1c,.,,..+1T.,.A.,,,,,,,,,.,,+1c,,,,*.,.-1099TTGCTGATGGCTGCTCCTTTGGCTCTTGGGGATGTGGAGATTGAAATCATTGATAAATTAATCTCCATTCCGTACGTCGAAATGACATTGAGATTGATGGAGCGTTTTGGTGTGAAAGCAGAGCATTCTGATAGCTGGGACAGATTCTACATTAAGGGAGGTCAAAAATACAAGTCCCCTAAAAATGCCTATGTTGAAGGTGATGCCTCAAGCGCAAGCTATTTCTTGGCTGGTGCTGCAATTACTGGAGGGACTGTGACTGTGGAAGGTTGTGGCACCACCAGTTTGCAGGGTGATGTGAAGTTTGCTGAGGTACTGGAGATGATGGGAGCGAAGGTTACATGGACCGAGACTAGCGTAACTGTTACTGGCCCACCGCGGGAGCCATTTGGGAGGAAACACCTCAAGGCGATTGATGTCAACATGAACAAGATGCCTGATGTCGCCATGACTCTTGCTGTGGTTGCCCTCTTTGCCGATGGCCCGACAGCCATCAGAGACGTGGCTTCCTGGAGAGTAAAGGAGACCGAGAGGATGGTTGCGATCCGGACGGAGCTAACCAAGCTGGGAGCATCTGTTGAGGAAGGGCCGGACTACTGCATCATCACGCCGCCGGAGAAGCTGAACGTGACGGCGATCGACACGTACGACGACCACAGGATGGCGATGGCTTTCTCCCTTGCCGCCTGTGCCGAGGTCCCCGTCACCATCCGGGACCCTGGGTGCACCCGGAAGACCTTCCCCGACTACTTCGATGTGCTGAGCACTTTCGTCAAGAATTAAGCTCTAGAAGAAGCTTCGACGAATTTCCCCGATCGTTCAAACATTTGGCAATAAAGTTTCTTAAGATTGAATCCTGTTGCCGGTCTTGCGATGATTATCATATAATTTCTGTTGAATTACGTTAAGCATGTAATAATTAACATGTAATGCATGACGTTATTTATGAGATGGGTTTTTATGATTAGAGTCCCGCAATTATACATTTAATACGCGATAGAAAACAAAATATAGCGCGCAAACTAGGATAAATTATCGCGCCCGGTGTCATCTATGTTACTAGATCGGGAATTGCGGCCGCTCTAGAACTAGTGGATCC,.+1A.-1T.-1T,.,.,,,,.,,,,,+2cc,,,,,..,,,t,+1c.,.,,.,.,,,.,,,.t,t.,....,        //".(&//&/-+&.(///'*/$+,-*-/...#+%./-/.,%$/%-."'-/#'/.)-/-'.,,&(*#-/./#/.-(,-/..(--*.(-.*./$-/)"///(,//.-/..-.//.)'//./)"(/*$/.-/+-.%/-,.('./.'&.-*+

Now the questions.

  1. The read depth passsing my set quality quality filter (ADP) is 148, yet the number of reference reads (RD) and alternate reads (AD) is 219 and 270, respectively. How is that possible?
  2. The alternate allele frequency (FREQ) is 26.68%. How is this being calculated given the info in question 1?
  3. How is this SNP passing my --min-var-freq 0.2 threshold given the low incidence of "G" at this position as shown in the pileup?

Thanks for your help

varscan pacbio blasr mpileup • 2.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6