"Segmentation fault" error from running align function in Rsubread package
1
0
Entering edit mode
5.7 years ago
wangdp123 ▴ 340

Hi there,

When running align function in Rsubread package for one particular RNA-Seq samples, an error message "Segmentation fault" came up and the program stopped at "50% completed". It works for other samples but only doesn't work for this sample.

Does anybody know how to tackle this problem?

Thank you very much,

Regards,

Tom

RNA-Seq Rsubread • 2.5k views
ADD COMMENT
0
Entering edit mode

Check if this BAM file is corrupt.

ADD REPLY
0
Entering edit mode

Input to align is fastq, not BAM.

ADD REPLY
1
Entering edit mode
5.7 years ago

Segmentation faults are caused by a program asking for memory at an address that it outside the region allocated to it by the operating system. The two most common reasons for this to happen are:

  • Running out of memory. I don't know how much memory Subjunc (the aligner that align calls) uses, but aligners can use anything from 4Gb (Hisat) to 32Gb (STAR). However, memory usage is usually determined by the size of the index, not the size of the input, and so should be the same for every sample. Still, worth checking.

  • An unexpected input being processed to produce an invalid memory address. This seems more likely. I would start with a quick manual inspection of the fastq file, so check that everything looks okay. I'd then check that the number of lines in the fastq file is a multiple of 4. If neither of these reveal the problem, you'll need to do a binary search for the record causing the problem.

Binary search for problem reads

Start by dviding your input into two equally sized files. For example, if my fastq contains 1,000,000 reads, i'd divide it in two with:

zcat myreads.fastq.gz | head -n2000000 > half1.fastq.gz
zcat myreads.fastq.gz | tail -n2000000 > half2.fastq.gz

Now try mapping each half. If neither cause an error, then the problem isn't your input file. If half1 causes the problem, divide that into two:

zcat half1.fastq.gz | head -n1000000 > quarter1.fastq.gz
zcat half1.fastq.gz | tail -n1000000 > quarter2.fastq.gz

otherwise if half2 causes the problem divide that (if they both do, just pick one).

Repeat this proceedure until you've only got a small enough number of reads left to inspect manually. If you can't see anything, you might try pipelining through cat -A to see if there are hidden characters. If every read is causing a problem, then there is something other than an invalid read(s) causing the problem.

ADD COMMENT
0
Entering edit mode

Hi, thanks for this.

I have checked the standard error file, which includes the following message:

 *** caught segfault ***

address 0x17a34e074, cause 'memory not mapped'

In addition, the same set of fastq files can be mapped against the genome without problem via Tophat2.

Does this mean anything?

ADD REPLY
0
Entering edit mode

So 'memory not mapped' just means that the application is requesting memory that has not been assigned to any task. The reason for that is not clear from the error - could be poor memory management, or a corrupt input leading to a bad memory request.

That your file maps well with TopHat2 suggests that there isn't anything grossly wrong with your fastq file. Things that might catch up one mapper but not another include: unusual format for the quality scores, presence of empty reads or reads which are shorter than the anchor length, reads with quality line and sequence line different lengths, missing pairs for some reads, etc.

Or it could just be a memory issue. Is this sample the one with the largest number of reads?

ADD REPLY

Login before adding your answer.

Traffic: 3314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6