Question

Haplotype Calling

3

Entering edit mode

11.7 years ago

Empyrean ▴ 170

Hi

I have 454 Amplicon sequencing data for a tetraploid potato crop. we have sequenced few genes of interest from several different genotypes. My aim is to seperate genotypes and sub genomes here. I am thinking of using haplotype analysis to seperate the genotypes as well as sub genomes here.

As i am new to this, can someone point me to some tools / softwares which can take the input i have and give haplotypes as output? All i have is gene sequences, and mapped bam file and raw reads. I dont have any marker data or genotype data with me.. i know they are sampled form 10 genotype of one type and other 5 are of other type..

454 sequencing haplotype calling • 7.4k views

ADD COMMENT • link updated 9.4 years ago by Biostar 20 • written 11.7 years ago by Empyrean ▴ 170

score 4 · Answer 1 · 2012-12-21

FreeBayes is a haplotype-based variant detector capable of detecting variants in polyploids. For your use-case in 454 data, which has a high indel error rate, you would first want to detect high-confidence variants using freebayes with --max-complex-gap 0 --ploidy 4, filter the results, bgzip and tabix index the resulting VCF, and then provide it as input for a second round of haplotype calling with a much larger --max-complex-gap limited by your read length.

freebayes --max-complex-gap 0 --ploidy 4 input.bam | vcffilter -f "QUAL > 20" >high_confidence.vcf
bgzip high_confidence.vcf
tabix -p vcf high_confidence.vcf.gz
freebayes --max-complex-gap 200 --ploidy 4 --haplotype-basis-alleles high_confidence.vcf.gz input.bam >haplotypes.vcf

The result will be a VCF with haplotypes in literal form wherever you have non-reference haplotype segregating with no more than 200bp between alleles in your high_confidence.vcf. If you want to convert from haplotype calls to phased genotypes across the component alleles in the haplotypes, use vcfallelicprimitives as such:

vcfallelicprimitives -t was_haplotype haplotypes.vcf >decomposed_haplotypes.vcf

Your results will be relatively consistent with different --max-complex-gap settings if you have high depth, but if your depth is low, you may see a loss in sensitivity and haplotyping/genotyping accuracy at higher depths.

score 0 · Answer 2 · 2012-09-05

Depending on the heterozygosity of your genes of interest, it might be hard to find separate haplotypes out. That being said, as long as you are confident in your synteny I would think that if you had a chance to detect haplotypes you could do it since the potato genome is so heterozygous.

I don't know of any specific tools or scripts for separating out haplotypes, but I have found Julian Catchen's Stacks pipeline to be an awesome resource. I'm using it to address population genetic questions with metagenomic amplicon data, so I would think that our uses aren't too far apart. Give it a try and see what you think. You'll need to know a bit of MySQL and be confident working in the shell to use it.

UPDATE: I just was reading the STACKS documentation and you can separate haplotypes using the genotypes command in the pipeline. The documentation will tell you which flags to use that will be specific for your data.

ANOTHER UPDATE: Also, If you have access to CLC Genomics Workbench, the newest version 5.5 has an option for filtering next-gen reads by haplotype. Go to Toolbox > Resequencing Analysis > Compare Variants > Filter on Haplotype.