Hi, I have been trying to use vg tools to determine alleles from new samples sequenced by Illumina, and ran into some problems here. I would appreciate it if you could help.
1. What input do I have to construct the graph?
I have 10 unique alleles for one MHC geneA, and another 15 alleles for another MHC geneB from 70 samples. The length of allele is around 1.5kb, and there are around 4 kb intergenic region between these two genes. Those alleles were obtained from PacBio long reads sequencing, so I would reckon they are quite reliable for the providing the polymorphic information.
2. What is my aim?
I would like to construct a variation graph combing all the haplotype information I got from these alleles, and map some short read data from new samples (whole-genome Illumina sequenced) using vg giraffe
onto the graph, to determine the alleles in the new samples.
3. What are my questions?
- What would be the best way to construct a graph for
vg giraffe
with separate MSA for each gene? - As the MHC genes are very polymorphic, I need to consider the possibility for one read being mapped to multiple regions, especially with the polymorphic information added. Could you suggest some parameters in this case?
Thanks, Monica