Hi everyone, I'm actually using MaSuRCA-3.2.6 to assemble my genome and a ran the fallowing script:
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=8:bigmem,mem=100gb
#PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.error
#PBS -o /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.out
#PBS -N ACG-006
#PBS -q q1week
PE= pe 150 22 /pandata/LEPIWASP/ACG-0006_0027/frag_1.fastq /pandata/LEPIWASP/ACG-0006_0027/frag_2.fastq
#set this to 1 if your Illumina jumping library reads are shorter than 100bp
#this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE = auto
#set this to 1 for all Illumina-only assemblies
#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
#otherwise keep at 0
USE_LINKING_MATES = 0
#specifies whether to run mega-reads correction on the grid
#specifies queue to use when running on the grid MANDATORY
#batch size in the amount of long read sequence for each batch on the grid
#coverage by the longest Long reads to use
#this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
LIMIT_JUMP_COVERAGE = 300
#these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically.
#set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS = cgwErrorRate=0.15
#minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if Illumina coverage >100
KMER_COUNT_THRESHOLD = 1
#whether to attempt to close gaps in scaffolds with Illumina data
#auto-detected number of cpus to use
NUM_THREADS = 16
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 200000000
#set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
Then, I got the asemble.sh file and I ran it as well and got the following .out:
[Sat Jun 16 22:32:45 CEST 2018] Processing pe library reads
[Sat Jun 16 22:49:04 CEST 2018] Average PE read length 150
[Sat Jun 16 22:49:05 CEST 2018] Using kmer size of 49 for the graph
[Sat Jun 16 22:49:06 CEST 2018] MIN_Q_CHAR: 33
WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1115876884, this automatic increase may be not enough!
[Sat Jun 16 22:49:06 CEST 2018] Creating mer database for Quorum
[Sat Jun 16 23:09:23 CEST 2018] Error correct PE.
[Sat Jun 16 23:11:49 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
/panhome/TOOLS/MaSuRCA-3.2.6/assemble.sh: line 102: 46750 Aborted quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)
) --contaminant=/panhome/TOOLS/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 16 -w 10 -e 3 -M quorum_mer_db.jf pe.re
named.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
Does someone have an idea of what is going on here? Thanks for your help.
The 2 fasta files are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB.