Segmentation faults are caused by a program asking for memory at an address that it outside the region allocated to it by the operating system. The two most common reasons for this to happen are:
Running out of memory. I don't know how much memory Subjunc
(the aligner that align
calls) uses, but aligners can use anything from 4Gb (Hisat) to 32Gb (STAR). However, memory usage is usually determined by the size of the index, not the size of the input, and so should be the same for every sample. Still, worth checking.
An unexpected input being processed to produce an invalid memory address. This seems more likely. I would start with a quick manual inspection of the fastq
file, so check that everything looks okay. I'd then check that the number of lines in the fastq
file is a multiple of 4. If neither of these reveal the problem, you'll need to do a binary search for the record causing the problem.
Binary search for problem reads
Start by dviding your input into two equally sized files. For example, if my fastq contains 1,000,000 reads, i'd divide it in two with:
zcat myreads.fastq.gz | head -n2000000 > half1.fastq.gz
zcat myreads.fastq.gz | tail -n2000000 > half2.fastq.gz
Now try mapping each half. If neither cause an error, then the problem isn't your input file. If half1 causes the problem, divide that into two:
zcat half1.fastq.gz | head -n1000000 > quarter1.fastq.gz
zcat half1.fastq.gz | tail -n1000000 > quarter2.fastq.gz
otherwise if half2 causes the problem divide that (if they both do, just pick one).
Repeat this proceedure until you've only got a small enough number of reads left to inspect manually. If you can't see anything, you might try pipelining through cat -A
to see if there are hidden characters. If every read is causing a problem, then there is something other than an invalid read(s) causing the problem.
Check if this BAM file is corrupt.
Input to align is fastq, not BAM.