Including reference genome in SNP-based phylogeny
1
1
Entering edit mode
5 weeks ago
maxrwjones ▴ 60

Hi all,

I have a set of ~15x coverage whole genome resequencing data for >200 accessions of a crop species. I have produced a SNP-based phylogeny of these accessions via the following operations:

  1. bowtie2 mapping against reference genome
  2. bcftools mpileup
  3. bcftools call
  4. vcftools to retrieve SNP-wise and accession-wise statistics, analysed in R
  5. filtering using vcftools
  6. vcf2phylip.py
  7. IQtree to produce final phylogeny

My question is, given that I know the reference genome's state for every included SNP, it seems there is enough information to treat it as an accession in its own right. So is there a way to include the reference genome in the final tree? Seems like IQtree should have an option for this? Or maybe even bcftools or vcftools?

Many thanks,

Max

phylogeny VCF WGS • 425 views
ADD COMMENT
3
Entering edit mode
5 weeks ago
Michael 54k

Hi,

That should be possible but somehow the sequence of the reference needs to included in the input given to IQ-Tree. It sort of is there already in the VCF via the REF allele. If you add a "dummy" sample to the vcf with genotype 0/0 or 0 (depending on ploidy of your file) that should give the right input to vcf2phylip to achieve it. Also, when selecting a model, make sure it has "+ASC" in it (Ascertainment correction), otherwise branch lengths might be overestimated.

Greetings to Norwich...

ADD COMMENT
0
Entering edit mode

Hi Michael,

Yes exactly! I wondered if perhaps IQ-tree could do this 'under-the-hood' but if not then adding a dummy sample with all loci set to 0/0 is the next best thing - I'll give it go. Do you think I should set the read depths and qualities to arbitrary numbers that pass my vcftools filters? Otherwise the whole dummy individual would get filtered out.

Yep, I have +ASC enabled already, thanks you for the heads up.

Greetings to you too - in Norway I see?

ADD REPLY
1
Entering edit mode

Do you think I should set the read depths and qualities to arbitrary numbers that pass my vcftools filters?

Good point! Yes I think so. Once you have a sequence alignment, the VCF statistics do not matter any more.

Yes, Greetings from Bergen, Norway. Just remembered visiting JIC and Norwich a while back (that was about 15 years at least I think).

Cheers Michael

ADD REPLY
0
Entering edit mode

Excellent, I shall try and figure out a little script that appends a dummy sample to each line (i.e. each SNP).

Ah lovely! I moved here ~3 years ago for my PhD and am really enjoying it, it's a great institute to work at.

Cheers, Max

ADD REPLY
0
Entering edit mode

I guess another alternative is to remap raw reads from the reference genome against itself... i.e. treat it like any other sample throughout the whole pipeline.

ADD REPLY

Login before adding your answer.

Traffic: 1448 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6