ABySS run problems
1
1
Entering edit mode
6.7 years ago
Anand Rao ▴ 630

I am using my university's HPC cluster to de novo assemble paired-end HiSeq 400 reads I've used Trimmomatic for adapter trimming and quality-based trimming I used BBmap's kmercountexact.sh to determine my preferred k-mer value for assembly Now I want to use ABySS for my de novo asssembly - the HPC uses SLURM scheduler

module load openmpi
Module openmpi/2.0.1 loaded 
module load abyss
Module abyss/1.9.0 loaded 

srun --partition=high --mem=24000 --time=12:00:00 --nodes=1 abyss-pe np=8 name=EthFoc-11_trimmomatic7_corr k=41 in='EthFoc-11_S285_L007_trimmomatic7_1P.00.0_0.cor.fastq EthFoc-11_S285_L007_trimmomatic7_2P.00.0_0.cor.fastq'

There is a lot of text in STDOUT - available at http://txt.do/d61vl Briefly, in STDOUT the last lines that indicate the the run was still working read as follows:

Assembling...
0: Assembled 51879929 k-mer in 81891 contigs.
Assembled 51879929 k-mer in 81891 contigs.
Concatenating fasta files to EthFoc-11_trimmomatic7_corr-1.fa
Concatenating fasta files to EthFoc-11_trimmomatic7_corr-bubbles.fa
Done.

But soon after this in the STDOUT, the run terminates with the following error message:

Concatenating fasta files to EthFoc-11_trimmomatic7_corr-1.fa
error: `contigs-0.fa': No such file or directory
make: *** [EthFoc-11_trimmomatic7_corr-1.fa] Error 1
srun: error: c11-96: task 0: Exited with exit code 2

I went through several biostars posts on ABySS run errors, but I don't think I have a direct solution to my problem... A: abyss mpirun non zero code, abyss-pe without openmpi, Error running Abyss with openMPI, Abyss-pe de-novo assembler error

Could it be a share access mis-configuration on the HPCC? - ABySS fails to write out coverage.hist file and stops

The files generated from this run are listed below:

-rw-rw-r-- 1 aksrao aksrao 42 Aug 18 00:01 EthFoc-11_trimmomatic7_corr-1.dot

-rw-rw-r-- 1 aksrao aksrao 0 Aug 17 23:16 EthFoc-11_trimmomatic7_corr-1.fa

-rw-rw-r-- 1 aksrao aksrao 1.3M Aug 17 23:15 EthFoc-11_trimmomatic7_corr-bubbles.fa

I wanted to test whether removing MPI from the equation will allow the run to completion, I tried

srun --partition=high --mem=24000 --time=12:00:00 --nodes=1 abyss-pe name=EthFoc-11_trimmomatic7_corr k=41 in='EthFoc-11_S285_L007_trimmomatic7_1P.00.0_0.cor.fastq EthFoc-11_S285_L007_trimmomatic7_2P.00.0_0.cor.fastq'

srun: job 13800791 queued and waiting for resources
srun: job 13800791 has been allocated resources

And the error I see happens almost right away, with the STDOUT looking as follows:

abyss-filtergraph  --dot   -k41 -g EthFoc-11_trimmomatic7_corr-2.dot1 EthFoc-11_trimmomatic7_corr-1.dot EthFoc-11_trimmomatic7_corr-1.fa >EthFoc-11_trimmomatic7_corr-1.path
abyss-filtergraph: ../Graph/DotIO.h:302: std::istream& read_dot(std::istream&, Graph&, BetterEP) [with Graph = DirectedGraph<ContigProperties, Distance>; BetterEP = DisallowParallelEdges; std::istream = std::basic_istream<char>]: Assertion `num_vertices(g) > 0' failed.
/bin/bash: line 1: 27510 Aborted                 abyss-filtergraph --dot -k41 -g EthFoc-11_trimmomatic7_corr-2.dot1 EthFoc-11_trimmomatic7_corr-1.dot EthFoc-11_trimmomatic7_corr-1.fa > EthFoc-11_trimmomatic7_corr-1.path
make: *** [EthFoc-11_trimmomatic7_corr-1.path] Error 134
make: *** Deleting file `EthFoc-11_trimmomatic7_corr-1.path'
srun: error: c11-91: task 0: Exited with exit code 2

What am I doing wrong? And how can I fix it? Since I am only a couple of days new to genome assembly and an hour into ABySS use, the more detailed your reply, the more useful it might be for me. Thanks!

ABySS genome assembly de novo MPI • 3.0k views
ADD COMMENT
0
Entering edit mode

Is the following requirement satisfied in your files?

A pair of reads must be named with the suffixes /1 and /2 to identify the first and second read, or the reads may be named identically. The paired reads may be in separate files or interleaved in a single file.

ADD REPLY
0
Entering edit mode

You can also post your question to the abyss user group.

ADD REPLY
3
Entering edit mode
6.7 years ago
benv ▴ 730

Hi Anand,

I'm not sure what the problem is, but I can hopefully provide some hints.

It looks like your abyss-pe command is correct. I suspect your problems are related your cluster job submission parameters. Learning to run MPI jobs on a cluster usually requires a bit of experimenting with job submission flags. If you have an IT department, you should ask them if they have any example scripts showing how to run MPI jobs on your cluster. Also, I would recommend first testing that you can successfully run a simple MPI program before trying ABySS. For example, this page provides a MPI "Hello, World!" program: https://hpcc.usc.edu/support/documentation/examples-of-mpi-programs/. You would have to paste the code into a file and compile it yourself, but it probably worth the effort.

In the log of a successful ABySS run, you should see multiple ABYSS-P processes ("ranks" in MPI terminology) running in parallel and communicating with each other. The processes can be running on different cluster nodes. Each MPI process (rank) writes its own temporary contig-<rank>.fa file, so if you are running a job with 8 processes (np=8), you would expect to see the following in your assembly directory:

contig-0.fa
contig-1.fa
contig-2.fa
contig-3.fa
contig-4.fa
contig-5.fa
contig-6.fa
contig-7.fa

When the parallel runs of ABYSS-P finish, these files are concatenated together into a single FASTA file and then removed.

At the beginning of a ABYSS-P log for np=8, you should see something like:

0: Running on host c11-96
1: Running on host c11-96
2: Running on host c11-96
3: Running on host c11-96
4: Running on host c11-96
5: Running on host c11-96
6: Running on host c11-96
7: Running on host c11-96

whereas in http://textuploader.com/d61vl, you are just seeing:

0: Running on host c11-96

in multiple runs of ABYSS-P.

It is appears that multiple independent ABySS jobs are being started rather than a single job with 8 processes ("ranks"). If they are all running in the same directory, they will overwrite each other's contig-0.fa files. Also, each independent job will delete its contig-0.fa file once the concatenating step is finished, which is likely why you are seeing:

error: `contigs-0.fa': No such file or directory
ADD COMMENT
0
Entering edit mode

Thank you benv, I will look into syntax for MPI jobs in general on our HPCC, and specifically for ABySS.

BTW, what is the location of this assembly directory? Is it in pwd with input files (that I have permissions to write to) or in abyss executable directory common to all HPCC users? (that I do not have permissions for) - thought it does not hurt to ask.

If latter, then I need to redirect / rename assembly directory so that permission is not a problem. But looks like the more likely problem is what you've outlined above as the difference between observed Vs. expected ABYSS-P log. I'll update us after this weekend.

ADD REPLY
0
Entering edit mode

The assembly directory is the working directory for your cluster job, which is usually just the directory where you ran your job submission command (i.e. sbatch/srun). If you're not sure what directory your job is running in, you can just put a pwd command at the top of your job script.

I agree, it doesn't look like a file permissions problem.

Good luck!

ADD REPLY
0
Entering edit mode

I am deleting my comment here and posting it as a new thread at Running ABySS at k-mer > 97.

ADD REPLY

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6