Hi I have a sam file that looks like this
head SRR062634.bwaSampe.sam
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
I have the sorted bam file of that as well. Now when am trying to index the bam file I am getting the below error. I need to index it so that I can do the variant calling with the bam file for my exome sequencing analysis
samtools index SRR062634.sorted.bam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Segmentation fault
Can anyone tell me if you have ever faced this situation? And if so how to get rid of it or do I have to skip the indexing step? But I don't think that's advisable. Any suggestions?
when I do the viewing I get the below error. So I guess I have to do the sorting again just in case it went wrong
samtools view -h SRR062634.sorted.bam [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "SRR062634.sorted.bam".
Yeah, the file got corrupted, so you'll need to resort.
Try again converting SAM to BAM with proper header. Check the tags that are compulsory for header (http://samtools.sourceforge.net/SAMv1.pdf).
I resorted the bam file and then indexed it, but the new .bai file seems very small. Is it ok to have such small sized file but I dont think so.
-rw-r--r-- 1 vdas DPT 4054208957 Oct 4 18:07 SRR062634.sorted.bam -rw-r--r-- 1 vdas DPT 7919104 Oct 4 18:26 SRR062634.sorted.bam.bai
Please comment
Yes sorting the bam file reduces the size.
but the sorted bam file after indexing the indexed bam file which is the .bai file it seems to be very less. I am talking about that. -rw-r--r-- 1 vdas DPT 4054208957 Oct 4 18:07 SRR062634.sorted.bam -rw-r--r-- 1 vdas DPT 7919104 Oct 4 18:26 SRR062634.sorted.bam.bai
If you can see the indexed bam file is too small
When you index a bam file then an index file will be generated whose size should be really small as it just contains indexes or indices.
yes this is the bai file I guess.. but sometimes you use this index .bai file for viewing in the IGV so then does it not gets affected for the small size? I thought the .bai file is the indexed file which is also used at times for downstream analysis but as you say if it only contain the indices then its got to be small. Correct me if am wrong
I have no idea about the IGV thing you are talking about but this .bai file (indexed file) is not important for the downstream analysis itself. But it should always accompany the bam file for whioch it was generated, if you want to use your BAM file later on.
The bai file is the index for the sorted bam. That's why it's small. You "open" that in IGV only in the sense that IGV uses it to quickly visualize regions of the sorted bam file. IGV needs both files to work efficiently.
Thanks a lot now it seems more clear to me. Thanks for the heads up.