Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

Best way to obtain reference and non-reference k-mer
1
1
Entering edit mode
12 months ago

Hi everyone,

I'm wondering what would be the best strategy to obtain the list of all k-mers in the graph with each k-mer labeled as reference (i.e., from specific path) or not-reference. A very nice to have extra feature would be to have, for each k-mer, the index(es) of its occurrence(s) in the reference (linear) space.

I see two possible roads:

vg kmers ...
vg find -k ...

Both strategies have prons and cons. The former produces more succinct output but I haven't found a way to discriminate whether k-mers belongs or not to a path (nor I found an easy way to obtain the index). The latter has more extensive output, but also produces much larger outputs.

Thanks for any suggestion, Michele S.

vg • 332 views
ADD COMMENT
3
Entering edit mode
12 months ago
Jouni Sirén ▴ 130

The easiest way is probably doing it outside vg:

  1. Extract all kmers (kmer occurrences) with vg kmers.
  2. Extract the selected paths in FASTA format with vg paths -v graph.vg -F -p path-names.txt > output.fa.
  3. Determine the kmers in the paths using an external tool.
  4. Compare the kmer sets with an external tool.

The path positions for non-path kmers are not always well-defined. There is some machinery for determining them (for alignments, not kmers), but even then, we often have to make arbitrary choices or give up trying.

ADD COMMENT
0
Entering edit mode

Thank you very much for the prompt and detailed reply. I was considering using external tools, but I wanted to be sure that there were no better ways built in the toolkit.

ADD REPLY

Login before adding your answer.

Traffic: 152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6