Bam And Indexed Bam Files
4
26
Entering edit mode
12.3 years ago
Sahel ▴ 260

Hi There,

I recently started grad-school and have no background working with sequencing data :( As the first thing to do my supervisor asked me to look at these two files "XXX.bam" and "XXX.bam.bai" and figure out if they are the same files (just one indexed) or they are different. I figured out that since these files have exact same name, but one with additional ".bai" at the end, it looks XXX.bam.bai is the index form of the XXX.bam, but I am not completely sure. Can some one please give me a hint how to make sure if the files are the same or not? What program I can use to generate indexed bam? (SAM?! I just heard about it, never had a chance to work with yet) and by what program I can visualize them?

Thank you so much....

Sahel

bam index • 118k views
ADD COMMENT
76
Entering edit mode
12.3 years ago

A bai file isn't an indexed form of a bam - it's a companion to your bam that contains the index.

A bam file is a binary blob that stores all of your aligned sequence data. You can view what's in the bam file using "samtools view bamfile.bam | less".

Bam files can also have a companion file, called an index file. This file has the same name, suffixed with .bai. This file acts like an external table of contents, and allows programs to jump directly to specific parts of the bam file without reading through all of the sequences. Without the corresponding bam file, your bai file is useless, since it doesn't actually contain any sequence data.

If you have a bam file without a corresponding index, you can generate one using "samtools index bamfile.bam".

If your index file is named identically, with just the additional ".bai" suffix, you can be reasonably sure that it was generated from the same file. If you have any doubt, though, it's easy enough to delete your bai file, then generate a new index using the previous command. Keep in mind that this may take a half hour or more depending on the size of your bam and the speed of your computer.

ADD COMMENT
10
Entering edit mode
12.3 years ago

I like UCSC's succinct description of a BAM file: a compact and index-able representation of nucleotide sequence alignments. although a standalone BAM file can be useful, a particular advantage of this format is its design for having the data binary compressed and easily indexed, so that navigating through it without the need of loading all the file into memory is possible. a BAM file is just the binary translation of a SAM file, this one being human readable, so aside from their nature (binary or not binary) both files are equivalent.

you may get the most appropriate SAM readings obviously from the SAMtools webpage, although I would also recomend looking at UCSC's BAM format webpage to get a nice description of both formats and their relationship.

ADD COMMENT
5
Entering edit mode
12.3 years ago
Oligo ▴ 60

The suffix bai is indeed the index file of the bam file. One way to create an index for a bam file is with the [?] Samtools[?] index command.

ADD COMMENT
3
Entering edit mode

moreover, there's no need to compare anything. if you aren't sure if a bai file corresponds to a particular bam file, just delete it and generate a new one as suggested.

ADD REPLY
1
Entering edit mode

The answer is yes. The simplest way is to move the index file into another directory and create a new index (with: samtools index BAM_FILE_NAME). Then you can compare the md5 checksums of the files: md5sum ORIGINAL_FILE md5sum NEW_INDEX_FILE In case the output of both commands is identical, these are the same index files.

ADD REPLY
0
Entering edit mode

Hi Oligo,

Thanks for the quick reply. So you think if I generate a new indexed bam and compare it to the original one, I can figure out if they both have been made from the same .bam file? May I ask what software do you suggest for comparing two indexed bam files?

Thanks again, really appreciate your help.

ADD REPLY
0
Entering edit mode
12.3 years ago
Sahel ▴ 260

Thanks all. Your answers helped me a lot... I made a new index and everything looks fine :) I really appreciate all your help... :)

ADD COMMENT

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6