Question

Unicycler - hybrid assembly failure

0

Entering edit mode

4.9 years ago

piotr.majewski • 0

Dear All,

I've recently encountered some issues with Unicycler assembly. I've tried to perform hybrid assembly with use of

1) trimmed Illumina reads (R1+R2); format: fastqsanger.gz

2) nanopore reads; format: fasqsanger

Unicycler readily deals with individual assembly of either Illumina or Nanopore reads. However, it fails to generate hybrid assembly. Any suggestions?

thanks in advance,

Piotr

PS here is the error report

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified

/pylon5/mc48nsp/xcgalaxy/main/staging/23588931/command.sh: line 95:
38467 Segmentation fault      

(core dumped) unicycler -t
"${GALAXY_SLOTS:-4}" -o ./ --verbosity 3 --pilon_path $pilon -1'fq1.fastq.gz' -2 'fq2.fastq.gz' -l lr.fastq --mode 'conservative' --min_fasta_length '100' --linear_seqs '0' --min_kmer_frac '0.2' --max_kmer_frac '0.95' --kmer_count '10' --depth_filter '0.25' --start_gene_id '90.0' --start_gene_cov '95.0' --min_polish_size '1000' --min_component_size '1000' --min_dead_end_size '1000' --scores '3,-6,-5,-2'

genome next-gen sequencing assembly software error • 2.5k views

ADD COMMENT • link updated 4.9 years ago by Joe 21k • written 4.9 years ago by piotr.majewski • 0

0

Entering edit mode

How much memory have you got available?

ADD REPLY • link 4.9 years ago by Joe 21k

0

Entering edit mode

I am currently using 46.5 GB out of total 250.0 GB space.

ADD REPLY • link 4.9 years ago by piotr.majewski • 0

1

Entering edit mode

By memory, I mean RAM, not disk storage.

ADD REPLY • link 4.9 years ago by Joe 21k

0

Entering edit mode

I've forgot to mention that I am running analyses on Galaxy server.

16GB RAM will be enough to run it offline?

ADD REPLY • link 4.9 years ago by piotr.majewski • 0

1

Entering edit mode

How big are the files, and what size genome are you expecting?

A seg fault suggests you perhaps don’t have enough memory for doing the hybrid assembly, but it works with the 2 datasets on their own as less memory is required. I would be surprised if 16GB is sufficient, but it’s entirely genome/data dependent.

ADD REPLY • link 4.9 years ago by Joe 21k

0

Entering edit mode

I am expecting genome somewhere around 5 Mb.

In case of input files, nanopore data is quite extensive

1) long reads - 2.3 Gb

2) short reads R1 - 0.17 Gb

3) short reads R2 - 0.16 Gb

ADD REPLY • link 4.9 years ago by piotr.majewski • 0

0

Entering edit mode

I suspect that may be too much data for your local machine. I don’t know what a typical Galaxy RAM allowance is. Presumably it’s dependent on the hosting server.

It might be interesting to try and randomly downsample the reads to see if you can reach a point where it runs, assuming it’s not some other issue.

Alternatively there are assembly + polishing workflows you could try, where you assemble the nanopore data first and then error correct with illumina. This might reduce the burden of having too much data being processed at once.

ADD REPLY • link 4.9 years ago by Joe 21k