Extract VCF from genome assemblies of multiple individuals
0
0
Entering edit mode
5.1 years ago
Anand Rao ▴ 630

Is it possible to extract SNP information into a VCF file, from ~ 20 genome assemblies of individuals from the same species, each about ~ 300MB in size?

While this is routinely done with NGS reads, mapping them to a reference, my question is specifically how to achieve the end goal of the VCF file, given the genome assemblies, but not their reads.

I am imagining the following sequence of steps, but for some of them I am not sure which tool is available / best suited:

1. Align all genomes using something like CACTUS multiple genome aligner - but 20 genomes of 300MMB size is almost guaranteed to make CACTUS run hang or crash...

2. Extract out and remove structural variant regions across these genomes - I'm not sure how exactly to carry this out

3. Of the remaining conserved genomic blocks, align to obtain SNP variants and their coordinates - snp-sites or a tool on those lines?

4. Convert SNP info into VCF - this should not be challenge, IMO

If there are orthogonal solutions to my problem, I welcome any and all suggested protocols. Thanks!

SNP Structural Variation VCF Genomes alignment • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6