Problem In Indexing The Sorted Bam File For Exome Sequencing Downstream Analysis
1
0
Entering edit mode
10.5 years ago
ivivek_ngs ★ 5.2k

Hi I have a sam file that looks like this

head SRR062634.bwaSampe.sam
@SQ    SN:chr1    LN:249250621
@SQ    SN:chr2    LN:243199373
@SQ    SN:chr3    LN:198022430
@SQ    SN:chr4    LN:191154276
@SQ    SN:chr5    LN:180915260
@SQ    SN:chr6    LN:171115067
@SQ    SN:chr7    LN:159138663
@SQ    SN:chr8    LN:146364022
@SQ    SN:chr9    LN:141213431
@SQ    SN:chr10    LN:135534747

I have the sorted bam file of that as well. Now when am trying to index the bam file I am getting the below error. I need to index it so that I can do the variant calling with the bam file for my exome sequencing analysis

samtools index SRR062634.sorted.bam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Segmentation fault

Can anyone tell me if you have ever faced this situation? And if so how to get rid of it or do I have to skip the indexing step? But I don't think that's advisable. Any suggestions?

samtools exome-sequencing • 5.4k views
ADD COMMENT
0
Entering edit mode
10.5 years ago

Firstly, can you format the question so that the SAM header is displayed properly? (Edit: Nevermind, Pierre Lindenbaum already did that!)

This is likely to have happened if the sorting step failed for some reason. I would presume that if you "samtools view -h SRR062634.sorted.bam" you would either eventually find a line that's cut off or you'll just have a fraction of the reads that you should. Just resort the BAM file. I think you can also just do this in one step with picard tools.

ADD COMMENT
0
Entering edit mode

when I do the viewing I get the below error. So I guess I have to do the sorting again just in case it went wrong

samtools view -h SRR062634.sorted.bam [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "SRR062634.sorted.bam".

ADD REPLY
0
Entering edit mode

Yeah, the file got corrupted, so you'll need to resort.

ADD REPLY
0
Entering edit mode

Try again converting SAM to BAM with proper header. Check the tags that are compulsory for header (http://samtools.sourceforge.net/SAMv1.pdf).

ADD REPLY
0
Entering edit mode

I resorted the bam file and then indexed it, but the new .bai file seems very small. Is it ok to have such small sized file but I dont think so.

-rw-r--r-- 1 vdas DPT 4054208957 Oct 4 18:07 SRR062634.sorted.bam -rw-r--r-- 1 vdas DPT 7919104 Oct 4 18:26 SRR062634.sorted.bam.bai

Please comment

ADD REPLY
0
Entering edit mode

Yes sorting the bam file reduces the size.

ADD REPLY
0
Entering edit mode

but the sorted bam file after indexing the indexed bam file which is the .bai file it seems to be very less. I am talking about that. -rw-r--r-- 1 vdas DPT 4054208957 Oct 4 18:07 SRR062634.sorted.bam -rw-r--r-- 1 vdas DPT 7919104 Oct 4 18:26 SRR062634.sorted.bam.bai

If you can see the indexed bam file is too small

ADD REPLY
0
Entering edit mode

When you index a bam file then an index file will be generated whose size should be really small as it just contains indexes or indices.

ADD REPLY
0
Entering edit mode

yes this is the bai file I guess.. but sometimes you use this index .bai file for viewing in the IGV so then does it not gets affected for the small size? I thought the .bai file is the indexed file which is also used at times for downstream analysis but as you say if it only contain the indices then its got to be small. Correct me if am wrong

ADD REPLY
0
Entering edit mode

I have no idea about the IGV thing you are talking about but this .bai file (indexed file) is not important for the downstream analysis itself. But it should always accompany the bam file for whioch it was generated, if you want to use your BAM file later on.

ADD REPLY
0
Entering edit mode

The bai file is the index for the sorted bam. That's why it's small. You "open" that in IGV only in the sense that IGV uses it to quickly visualize regions of the sorted bam file. IGV needs both files to work efficiently.

ADD REPLY
0
Entering edit mode

Thanks a lot now it seems more clear to me. Thanks for the heads up.

ADD REPLY

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6