Hello,
I'm trying to assemble contigs from a set of paired end reads, denoted as SRR960028_1.fasta and SRR960028_2.fasta. I'm running ABySS on my university's HPC facility. When I run ABySS it terminates early. The line of code itself (within the PBS file) is:
abyss-pe name=abyss_test1 k=63 in='SRR960028_1.fastq SRR960028_2.fastq' v=-v
The tail of the error file looks like this:
Mapped 272979576 of 273907308 reads (99.7%)
Mapped 247491841 of 273907308 reads uniquely (90.4%)
Read 273907308 alignments
Mateless 273907308 100%
Unaligned 0
Singleton 0
FR 0
RF 0
FF 0
Different 0
Total 273907308
abyss-fixmate: error: All reads are mateless. This can happen when first and second read IDs do not match.
error: 'abyss_test1-3.hist': No such file or directory
make: *** [abyss_test1-3.dist] Error 1
make: *** Deleting file `abyss_test1-3.dist'
I've seen the abyss-fixmate error pop up on threads here before, but most of threads about the error seem to have 0 reads in the "Mateless" or the "Total" read section, whereas I have a number. I've also opened the .fasta files and they definitely contain reads. I've seen a few threads that recommend denoting the lines within the .fasta files with a /1 or a /2, but I was under the impression that denoting the files themselves as reads1.fa and reads2.fa would suffice for ABySS (or at least according to the ABySS manual, unless I'm incorrect).
The only thing in the output file is this:
abyss-map -v -j40 -l40 SRR960028_1.fastq SRR960028_2.fastq abyss_test1-3.fa \
|abyss-fixmate -v -l40 -h abyss_test1-3.hist \
|sort -snk3 -k4 \
|DistanceEst -v -j40 -k63 -l40 -s1000 -n10 -o abyss_test1-3.dist abyss_test1-3.hist
The output files generated by ABySS from the run included abyss_test1-1.fa, abyss_test1-2.fa, abyss_test1-3.fa and a abyss_test1-unitigs.fa file. I've checked the head and tail of the files, and they appear to contain contigs.
I'm reluctant to use these files for any analysis because I'm not sure how ABySS assembled them - does anybody know how ABySS assembled them?
Does anybody have any clue as to why ABySS is terminating early, and how I can fix it?
Thanks in advance!
Thanks for clarifying, benv!
The first 10 lines of read 1 file:
And the first 10 reads of the second file:
I see. Those read IDs do not follow the rules I described above. For example, the read IDs for the first pair of reads need to be either:
@SRR960028.1
(read 1 file) and@SRR960028.1
(read 2 file)OR
@SRR960028.1/1
(read 1 file) and@SRR960028.1/2
(read 2 file)You will have to fix the IDs yourself with a unix script (e.g. sed, awk, perl, python).
Understood! I wrote this sed script:
... and it seems to have worked (checking the head and tail again). Fingers crossed that assembly goes better this time, and thanks for your help!