bwa mem seq and Segmentation fault (core dumped)
3
0
Entering edit mode
8.0 years ago
Whoknows ▴ 960

Hi

I ran bwa mem on a SRA exome file (SRR2968047) but i saw following error :

Segmentation fault (core dumped)

Indexing was done successfully, and my command is :

bwa mem -t 3 -R '@RG\tID:SRR2968047\tSM:ExomeSample1\tPL:Illumina\tPU:Hiseq2500' -M Genome/hg38 Raw/Clean/1.fq.gz Raw/Clean/2.fq.gz | samtools view -Sbh | samtools sort - aln-sorted

other feature of my system:

  • OS: Fedora
  • RAM : 7GB
  • CPU: 4 core

How can I fix this problem??

bwa genome exome • 10k views
ADD COMMENT
1
Entering edit mode

I ran your command. It works fine to me. It seems you have Memory problem. May be RAM or CPU. Here is my suggestion to come to a solution:

1- Extract around 20000 reads: head -n 20000 SRR.fastq > SRR_new.fastq

2- Run bwa again with option "-t 2" not -t 3. And exclude this part of the pipeline for now "samtools view -Sbh | samtools sort - aln-sorted".

Give us a feedback then

Hope it helps.

ADD REPLY
1
Entering edit mode

Or even -t 1, I believe bwa uses 5 point something Gb of memory for aligning against the human genome.

ADD REPLY
0
Entering edit mode
8.0 years ago

Segmentation fault (segfault) is a very generic error, there can be several reasons for this. Best is to check the log, if you generate any or switch on the verbose functionality. Also, check problems like do you have enough disk space, tmpdir is empty etc. It could also be related to the file permissions and the location you are trying to write the output to, is it accessible?

Also, I just looked at this post here, Bwa mem is exiting with segmentation fault for large contigs (> 0.4Mbp), as suggest by Heng, use the updated version of bwa.

ADD COMMENT
0
Entering edit mode

please see the error log:

bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 412708 sequences (40000092 bp)... [M::process] read 412590 sequences (40000195 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 158858, 7, 5) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (87, 116, 161) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 309) [M::mem_pestat] mean and std.dev: (128.09, 53.87) [M::mem_pestat] low and high boundaries for proper pairs: (1, 383) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 412708 reads in 208.661 CPU sec, 685.940 real sec Segmentation fault (core dumped)

ADD REPLY
1
Entering edit mode

As Sukhdeep said there are several possible reasons. From my experience these are things you can try to diagnose and solve:

Is your machine 32 or 64 bit? We've tried to run bwa mem on 32 bit and we also got a segmentation fault so it's best to run on 64 bit.

Another very possible reason is that there isn't sufficient memory to run bwa mem. It has pretty high memory requirements and if you are running some heavy processes at the same time it could lead to that error.

I see that you are piping the bwa mem result to samtools, have you tried seperating these commands? This also has to do with memory.

ADD REPLY
0
Entering edit mode
8.0 years ago
GenoMax 141k

Heng Li has said that bwa requires 5.5G RAM with one thread. Splitting the alignment/samtools into separate operations (as suggested by @alons may allow this to work considering the 7GB RAM (odd number but hopefully correct) available per original post.

ADD COMMENT
0
Entering edit mode
8.0 years ago
Whoknows ▴ 960

Thanks all of you and fernardo for helping me.

After splitting 20K reads from fastq files and updaing zlib for bwa it showed same error by using pipe command, take a look

[M::worker2@2] performed mate-SW for 43682 reads [M::worker2@3] performed mate-SW for 44128 reads Segmentation fault (core dumped)

but when I ran commands separately (without pipe) it works fine for 20K reads at last!! And sorted-bam file size is ~12 MB.

I'm also curious about below information running samtools on sam file.

  • [samopen] SAM header is present: 24 sequences.
    • [bam_header_read] EOF marker is absent. The input is probably truncated.

why it has only 24 sequences?
And, What EOF marker is absent means??

ADD COMMENT
0
Entering edit mode

How many scaffolds in your reference?

ADD REPLY
0
Entering edit mode

I don't know exactly, I used UCSC-hg38 reference for my data.

ADD REPLY
0
Entering edit mode

That's probably 22 autosomes + x and y makes 24 sequences in the header

EOF marker is absent means that your file has no proper end and probably it is malformed in some way. How did you go from sam to sorted bam?

ADD REPLY
0
Entering edit mode

I used this code:

bwa mem -t 4 -R '@RG\tID:SRR2968047\tSM:ExomeSample1\tPL:Illumina\tPU:Hiseq2500' -M Genome/hg38 Raw/Clean/1.fq.gz Raw/Clean/2.fq.gz  > out.sam

samtools view -Sbh out.sam | samtools sort - aln-sorted
ADD REPLY
0
Entering edit mode

and if you try it like this:

samtools view -Sb out.sam -o out.bam
samtools sort out.bam aln-sorted

should give you a aln.sorted.bam that should be ok

ADD REPLY

Login before adding your answer.

Traffic: 2443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6