Finding all positions for all kmers belonging to a (not so short) list of kmers in a large genome
0
0
Entering edit mode
5.1 years ago
jyu429 ▴ 120

Hi,

Is there a tool for not only enumerating the counts of kmers (like jellyfish) but also will list their positions? I know its much more exhausting memory-wise but I'm looking for the best way to do this, even if a tool doesn't exist currently.

Thanks!

genome sequencing kmer • 1.7k views
ADD COMMENT
0
Entering edit mode

Take a look at Finding 16 mer not present in GRCh38. In this a suggestion was to use bowtie to align the kmers against the genome. I would do the alignment and then filter for matches with 100% sequence identity. It might help to set gap opening and mismatch penalties to like 10000 to only retain perfect matches.

ADD REPLY
0
Entering edit mode

Is that really faster than for example implementing a search trie?

ADD REPLY
0
Entering edit mode

How large your Kmers?, all combinations?, all occurrences? I used to code some scripts in Perl for kmer counting (8-12 kmers) with their position for cis-regulatory elements in some plant genomes, so it is not hard to do, even on the 2 GB RAM machine I had.

ADD REPLY

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6