Hello vg gurus,
When inducing a graph from a vcf, I'd like to have the sample-threads show up in the graph as paths instead of it being the individual variant ids (the latter seems to be the default behavior with 'vg construct -a'). It would be very useful to know which segments were induced by which sample(s). If this is not possible to do with 'construct', is there another way to add sample-coherent threads as paths? Mapping the sequences back to the graph, one sample at a time, and then assigning the paths based on the mapping seems to be too error-prone. It seems that getting the paths at the time of the initial graph construction/augmentation would be a more natural way of doing it.
The original sequences can be stored as threads (lightweight paths) in a GBWT index. See the index construction wiki page for details on building the GBWT. Storing a large number of paths in the graph itself is usually not a good idea, because the paths are not very space-efficient.
If you want to extract the threads for a specific sample and add them as paths to the graph, you can do it with the vg paths and vg augment subcommands: