compiling kmergenie 1.7038. Is it possible?
2
1
Entering edit mode
7.2 years ago
torsten ▴ 10

I have downloaded kmergenie 1.7038 and attempted to compile it on (1) Ubuntu 14.01, (2) a cluster which I think is based on Suse Linux, and (3) Mac OS X (10.10.5). The compilation instructions are very simple ("make"), but have failed on all three platforms. The failures seems to be related to the bundled ntcard software while linking. On Ubuntu, the long string of 'undefined reference' errors and 'access beyond end' errors concludes with:

/usr/bin/ld: ntcard-ntcard.o: access beyond end of merged section (36032) 
/usr/bin/ld: ntcard-ntcard.o(.debug_info+0x69e5): reloc against `.debug_str': error 2
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
make[2]: *** [ntcard] Error 1

The errors superficially do not look the same on different platforms. In order to check ntcard itself, I have downloaded and compiled that application separately (no errors).

I would be grateful for any suggestions of how to get this to compile. Thanks!

kmergenie compilation kmer • 3.3k views
ADD COMMENT
1
Entering edit mode

I don't believe that KmerGenie has a solid theoretical ground for its claims.

KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths.

Why? I don't know. It doesn't make any sense to me.

However, BBMap has a tool called TadWrapper that will rapidly do assemblies at various kmer lengths and tell you which assembly actually had the best contiguity. You can use it like this:

tadwrapper.sh in=reads.fq out=contigs_k%.fa k=31,62,93,124 bisect expand

Will that tell you the exactly optimal kmer length for the assembler that you eventually plan to use? No, that's impossible; the only way to do that is to assemble at multiple kmer lengths with the actual assembler you will use. But, it will give you a very close approximation, since it actually does an assembly with that kmer length.

If you do want to follow KmerGenie's approach and find out which kmer length yields the maximal number of unique kmers, you can do that with BBMap's "kmercountmulti.sh" tool, which is extremely fast. But I don't recommend that.

BBMap is already compiled, so you just unzip it and it will work as long as you have Java installed.

ADD REPLY
0
Entering edit mode

Thank you for your suggestions, Brian. I will definitely look into it. /T

ADD REPLY
0
Entering edit mode

Hi Brian,

The theoretical foundations of kmergenie can be found in Section 2 of our article. Please feel free to email us if any detail was unclear there.. This article was published in 2013 but I continue to believe that the theoretical grounds there still hold for past and current Illumina single-k genome assemblies ;)

Rayan

ADD REPLY
2
Entering edit mode
7.2 years ago
Wede ▴ 20

Hi, i had the same problem, remove ntcard directory in kmergenie directory, then make a git clone of ntcard

git clone https://github.com/bcgsc/ntCard.git

In ntCard/

./autogen.sh
./configure
make

Retry 'make' in kmergenie/

ADD COMMENT
1
Entering edit mode
7.1 years ago
Rayan Chikhi ★ 1.5k

Hi, kmergenie has been updated to version 1.7039, hopefully it resolves this compilation issue. Please let me know if it doesn't. (kmergenie@cse.psu.edu)

ADD COMMENT

Login before adding your answer.

Traffic: 1542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6