Generating maximum likelihood trees from multi-sample VCF files
0
1
Entering edit mode
5.1 years ago
rc16955 ▴ 90

Hi all,

I have whole-genome sequencing data for 100 individuals of my study species and would like to construct a maximum likelihood tree of them. So far I've called SNPs using bcftools and have all samples in a single multi-fasta VCF file. Are you aware of any software that can take a multi-fasta VCF file as input and use it to build a maximum likelihood tree?

Previously, I have used single-sample VCF files to generate separate fasta files for each samples using vcf-consensus, and then aligned these with Mafft and made trees from alignments with RaxML. The problem with this is that it loses information about heterozygosity - vcf-consensus simply always uses the ALT allele and even if it did use IUPAC ambiguity codes for heterozygous sites, I don't think that RaxML can handle these.

For reference, there's about 170,000 SNP sites in a genome of about 41Mb. Within each sample generally about a third of sites are heterozygous.

Sorry in advance for any gaps in understanding revealed by this question, and sorry if this has been answered before (I did find a few similar questions but sadly these didn't have answers).

Thanks in advance!

vcf maximum likelihood phylogeny snp sequencing • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 1984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6