Implementing a naive javascript-based de Bruijn graph
0
3
Entering edit mode
7.9 years ago

I'm trying to implement a naive implementation of a de Bruijn graph.

For now I wonder how the reverse-complement sequences should be handled ( related: How to handle Reverse complement in De-Bruijn Graph ) . In my code I insert the kmers of a read as well as its' reverse-complement kmers. But I don't think that duplicating the graph is a correct way to handle things.

Here is my current javascript source . It may be just wrong.

Thanks,

Pierre

debruijn graph javascript algorithm • 2.3k views
ADD COMMENT
2
Entering edit mode

It's typical to store a canonical version of a kmer, to reduce memory usage. For example, compare a kmer to its reverse-complement and store only the one that is lesser; and upon lookup, only look for the canonical version. But storing both is simpler and sometimes faster, so there's nothing wrong with that. Also, I suggest discarding all the kmers that contain Ns; they cause headaches.

ADD REPLY
0
Entering edit mode

compare a kmer to its reverse-complement and store only the one that is lesser

so in TTT/AAA I would only store the AAA : Then, when inserting a new node in the graph, i must take care of inserting on 5' or 3'.

But storing both is simpler and sometimes faster

but the graph contains the same information twice isn't it ?

ADD REPLY
2
Entering edit mode

Yes, storing twice as much information has disadvantages. But if you put all the kmers in a hashtable, and you want to test for the presence of a kmer, it's easy - just do a hash lookup. Whereas if you only store the canonical copy, you have to reverse-complement, compare, and then do a hash lookup. This can be slow depending on how efficient your reverse-complement is, particularly if you are representing kmers as Strings.

Normally, when building a DeBruijn graph from reads, you have to do 8 lookup operations per kmer - one for each possible next letter in each direction - so avoiding expensive reverse-complements during that phase can be useful. That said, I prefer to store kmers in a single canonical form.

ADD REPLY
1
Entering edit mode

The list of kmers contains the same information twice, yes, but any given graph would be end up being the same.

ADD REPLY

Login before adding your answer.

Traffic: 3926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6