Biostar Beta. Not for public use.
Finding all positions for all kmers belonging to a (not so short) list of kmers in a large genome
0
Entering edit mode
15 months ago
jyu429 • 120
United States

Hi,

Is there a tool for not only enumerating the counts of kmers (like jellyfish) but also will list their positions? I know its much more exhausting memory-wise but I'm looking for the best way to do this, even if a tool doesn't exist currently.

Thanks!

ADD COMMENTlink
0
Entering edit mode

Take a look at Finding 16 mer not present in GRCh38. In this a suggestion was to use bowtie to align the kmers against the genome. I would do the alignment and then filter for matches with 100% sequence identity. It might help to set gap opening and mismatch penalties to like 10000 to only retain perfect matches.

ADD REPLYlink
0
Entering edit mode

Is that really faster than for example implementing a search trie?

ADD REPLYlink
0
Entering edit mode

How large your Kmers?, all combinations?, all occurrences? I used to code some scripts in Perl for kmer counting (8-12 kmers) with their position for cis-regulatory elements in some plant genomes, so it is not hard to do, even on the 2 GB RAM machine I had.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1