Question

What happened after coverage.hist in Abyss?

0

Entering edit mode

6.6 years ago

wangzz.email • 0

HI,

I just have a general question, after coverage.hist is generated, what is the next step in Abyss? It seems after the file is produced, the program got stuck, running two days without any output. I am trying to assemble a H. sapien genome on a single machine with 40 cores. I have used the mpi command recommended by the developers.

Thank you!

George

abyss assembly • 1.9k views

ADD COMMENT • link updated 6.6 years ago by Jean-Karim Heriche 27k • written 6.6 years ago by wangzz.email • 0

0

Entering edit mode

First, it's not unusual for a genome assembly to take a long time. On the other hand, check that you have enough RAM on this machine. According to this paper, ABySS 2.0 required 34GB of RAM for a human genome and computation took 20 hours with 64 cores. It will take longer if using fewer cores and/or slower CPUs. If you don't have enough RAM, it'll take forever.

ADD REPLY • link 6.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hi @wangzz,

Can you please provide your abyss-pe command and your complete log output? It would be helpful determing exactly where the program froze. You can use a GitHub gist for the log output if you don't want to post it here. If possible, please enable verbose logging by adding v=-v to your abyss-pe command. It would be very hard to troubleshoot your assembly otherwise.

Also: Thank you for your reply, Jean-Karim.

ADD REPLY • link 6.6 years ago by benv ▴ 730

0

Entering edit mode

Thanks for quick reply.

The command looks like this: abyss-pe -C abyss.96 np=40 k=96 v=-v name=sample q=15 \ lib='sample_lib_0 sample_lib_1' \ sample_lib_0='L003_R1.00.fa.gz L003_R2.00.fa.gz' \ sample_lib_1='L003_R1.01.fa.gz L003_R2.01.fa.gz' \

There are 100+ paired files.

The standard error stops at this line: 'L008_R1.15.fa.gz': discarded 124675 reads shorter than 96 bases

The standard output stops at this line: 36: Found 78803591 k-mer in 378755 contigs before removing low-coverage contigs. Removed 1456823 k-mer in 57048 low-coverage contigs.

We sequenced the same sample twice, different libraries (all pair end) though. The two data sets have about the same coverage (40X). The first data set was assembled successfully by abyss; the second data set looks like got stuck. We ran both samples on a 500GB cluster node with 40 cores.

Thanks.

ADD REPLY • link 6.6 years ago by wangzz.email • 0

0

Entering edit mode

500 GB might not be enough unfortunately.

Your abyss-pe command looks good.

The ABySS log contains messages indicating how much memory is being used per MPI process. If you are able to post the full log output (e-mail benv at bcgsc dot ca, or post to a GitHub gist), I could have a look and see if that is the problem.

If you don't have a larger memory machine, you could try assembling with ABySS's new Bloom filter mode: https://github.com/bcgsc/abyss#assembling-using-a-bloom-filter-de-bruijn-graph

ADD REPLY • link 6.6 years ago by benv ▴ 730

0

Entering edit mode

OK. It might be due to the network problem. The machine I was running have more than 2 network configurations (at least two cards). This may confuse the MPI? I changed to another machine, and it worked. So thanks for your reply.

ADD REPLY • link 6.6 years ago by wangzz.email • 0

score 0 · Answer 1 · 2017-09-08

0

Entering edit mode

6.6 years ago

Jean-Karim Heriche 27k

This may be related to this mpi issue described in the FAQ.

ADD COMMENT • link 6.6 years ago by Jean-Karim Heriche 27k