I'm wondering what would be the best strategy to obtain the list of all k-mers in the graph with each k-mer labeled as reference (i.e., from specific path) or not-reference. A very nice to have extra feature would be to have, for each k-mer, the index(es) of its occurrence(s) in the reference (linear) space.
I see two possible roads:
vg kmers ...
vg find -k ...
Both strategies have prons and cons. The former produces more succinct output but I haven't found a way to discriminate whether k-mers belongs or not to a path (nor I found an easy way to obtain the index). The latter has more extensive output, but also produces much larger outputs.
Extract all kmers (kmer occurrences) with vg kmers.
Extract the selected paths in FASTA format with vg paths -v graph.vg -F -p path-names.txt > output.fa.
Determine the kmers in the paths using an external tool.
Compare the kmer sets with an external tool.
The path positions for non-path kmers are not always well-defined. There is some machinery for determining them (for alignments, not kmers), but even then, we often have to make arbitrary choices or give up trying.