Biostar Beta. Not for public use.
Question: Is My Bam File Sorted ?
39
Entering edit mode

Hi all,

I wonder if there is any tool (in samtools ?) to quickly know whether a BAM file has been sorted or not.

(longer story: I've merged a set of sorted bams using ' samtools merge ' and I've then used MarkDuplicates with the option: 'assume_sorted=true'. Am I right ?)

Thanks,

Pierre

UPDATE : I wrote a post about it : http://plindenbaum.blogspot.com/2011/02/testing-if-bam-file-is-sorted-using.html

ADD COMMENTlink 9.1 years ago Pierre Lindenbaum 120k • updated 2.9 years ago martin.triska • 110
Entering edit mode
1

For those just looking for a quick way to do this using modern samtools, see martin.triska's answer further down.

ADD REPLYlink 2.1 years ago
wtwhite
• 10
Entering edit mode
0

Hi Pierre, do you have similar solution to check if sam file is sorted or not? Thank you

ADD REPLYlink 6.3 years ago
Prakki Rama
♦ 2.3k
Entering edit mode
0

just pipe your SAM into 'samtools view' before using my program.

ADD REPLYlink 6.3 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

got it. Thank you :)

ADD REPLYlink 6.3 years ago
Prakki Rama
♦ 2.3k
Entering edit mode
0

Hi Pierre, You didn't give us the explicit answer yet? Do we still need sort the bam after we 'samtools merge' sorted small bams? I made a small test and find the merged sorted bam are already sorted.

ADD REPLYlink 3.4 years ago
Shicheng Guo
♦ 7.5k
29
Entering edit mode

You can use the sort order (SO) flag in the header to check if the file has been sorted:

% samtools view -H 5_110118_FC62VT6AAXX-hg18-unsort.bam
@HD    VN:1.0    SO:unsorted
% samtools view -H 5_110118_FC62VT6AAXX-hg18-sort.bam
@HD    VN:1.0    SO:coordinate

Unfortunately samtools index will work on both types of files without raising a commandline error, but will produce a smaller index file which is indicative of a problem:

% samtools index 5_110118_FC62VT6AAXX-hg18-sort.bam 
% echo $? 
0
% samtools index 5_110118_FC62VT6AAXX-hg18-unsort.bam 
% echo $?
0
% ls -lh *.bai 
chapman users 6.2M 2011-02-01 06:46 5_110118_FC62VT6AAXX-hg18-sort.bam.bai
chapman users  408 2011-02-01 06:47 5_110118_FC62VT6AAXX-hg18-unsort.bam.bai

Edit: From the comments it appears as if samtools will sometimes raise an error in this case. Here is what an unsorted BAM file that finishes with a truncated index and no error messages looks like:

% samtools view 5_110118_FC62VT6AAXX-hg18-unsort.bam | head
HWI-EAS264:5:1:1353:936#0       77      *       0       0       *       *       0       0       NTTTCCATTCCATTATGGATGATTCCATTCCATTGTATTC        ########################################        XM:i:0
HWI-EAS264:5:1:1353:936#0       141     *       0       0       *       *       0       0       TGGAACGGAATGGAATGGAATGGAATGGAATCAATCCGAC        DDGGDEGGGGDG@BGGG=@G:EFFEEB4FF3BBB@B<EBA        XM:i:0
HWI-EAS264:5:1:1463:944#0       99      chr4    79057890        255     40M     =       79058357       507      NTAGATATGTAAATAAAGTACAGGAAAATGTGGTGAGTAT        ########################################       XA:i:1   MD:Z:0A39       NM:i:1
HWI-EAS264:5:1:1463:944#0       147     chr4    79058357        255     40M     =       79057890       -507     AAAATATATACCAGGTAGATAAAGGGTCGTTCGTATTTCA        @B2EEDEB8@ED@DE?,A:?:DGEGDD:GGA@=8@182@A       XA:i:0   MD:Z:40 NM:i:0
HWI-EAS264:5:1:1534:933#0       77      *       0       0       *       *       0       0       NACTCCATTCAATTCCATTCCTTTTAGCTCGGGTTGTTTC        ########################################        XM:i:0
HWI-EAS264:5:1:1534:933#0       141     *       0       0       *       *       0       0       AATGGAATGGAATGGAATAAATCCCGGGGGAGTGGAATGG        BD4CC=G@D@BB-A?BB-:B==DB,==*==)4&::A=>CB        XM:i:0
HWI-EAS264:5:1:1691:931#0       99      chr18   28072372        255     40M     =       28072771       439      NCTGGGTTTTTATGTCTTCCAGCTAGTAAGTGCTGGAGCT        #322244442<:22<@@C@C@@;@;646666446664536       XA:i:1   MD:Z:0G39       NM:i:1
HWI-EAS264:5:1:1691:931#0       147     chr18   28072771        255     40M     =       28072372       -439     GTTCTCCCGCCTGCACCTCTCTCTGTGTGTTGGCTTTGTT        >B@D>ADDBDD>BD-DDDEDGGFHIIFI@HIIIHHIIIIH       XA:i:0   MD:Z:40 NM:i:0
ADD COMMENTlink 9.1 years ago Brad Chapman 9.4k
Entering edit mode
3

STrange... we have sorted BAM files with @HD VN:1.0 SO:unsorted in the header... I need to investigate further it seems...

ADD REPLYlink 9.1 years ago
Tim_Yates
• 110
Entering edit mode
1

Thanks very much Heng, that did it. I now get the error '[bam_index_core] the alignment is not sorted: reads without coordinates prior to reads with coordinates.' I appreciate you looking into this.

ADD REPLYlink 9.1 years ago
Brad Chapman
9.4k
Entering edit mode
0

The only problem is that changing that flag is program dependent and not required by the BAM spec. Picard does set the flag correctly, although samtools sort does not. I remember reading on SeqAnswers that Heng was planning on implementing this to avoid confusion; he can probably chime in if that's still a plan. Another tool to check for "bad" indexes on unsorted files is 'samtools idxstats' which might be nicer than looking at the size.

ADD REPLYlink 9.1 years ago
Brad Chapman
9.4k
Entering edit mode
0

Which version of samtools is indexing an unsorted file? I happen to have version 0.1.8 (r613) here and it stops with a message that the alignments are not sorted if I try to index a file sorted by queryname.

ADD REPLYlink 9.1 years ago
iw9oel_ad
6.0k
Entering edit mode
0

That's with the most recent version: 0.1.12a (r862). I would prefer to get an error.

ADD REPLYlink 9.1 years ago
Brad Chapman
9.4k
Entering edit mode
0

Same observation as keith I do get an error: [mdo041@astrakan ~]$ samtools-0.1.12a/samtools index test.fas.blat.sam.bam [bam_index_core] the alignment is not sorted (NG-1755): 2158255 > 2132773 in 1-th chr using a newly imported sam file. Brad: I think the reason for not seeing an error is that the file you are trying _is actually_ sorted just by chance. So I don't know about the flags.

ADD REPLYlink 9.1 years ago
Michael Dondrup
46k
Entering edit mode
0

Try the version here: https://github.com/lh3/samtools

ADD REPLYlink 9.1 years ago
lh3
31k
Entering edit mode
0

Same result for me: [bam_index_core] the alignment is not sorted

ADD REPLYlink 9.1 years ago
Michael Dondrup
46k
Entering edit mode
0

Thanks. I mainly ask Brad to try it on the unsorted file for which samtools does not give any error messages.

ADD REPLYlink 9.1 years ago
lh3
31k
Entering edit mode
0

Michael and Keith, the file is definitely unsorted. Not sure what I'm doing different. Heng, I tried the GitHub version and got the same behavior. Edited the answer above with the first few lines of the file if that helps; it's from a paired end Bowtie alignment.

ADD REPLYlink 9.1 years ago
Brad Chapman
9.4k
Entering edit mode
0

Hmm... Please try again. Updated a minute ago.

ADD REPLYlink 9.1 years ago
lh3
31k
Entering edit mode
0

I have realized that such input will cause problems, but it seems that the fix is not correct. Please try again. Updated a minute ago.

ADD REPLYlink 9.1 years ago
lh3
31k
11
Entering edit mode

If anyone arriving to this thread years later as me, you can run:

samtools stats <file.bam> | grep "is sorted:"

(I'm using the samtools 1.4, don't know when was this added to stats) Enjoy :)

ADD COMMENTlink 2.9 years ago martin.triska • 110
Entering edit mode
0

If it gives back "is sorted: 1" means it is, indeed, sorted? I want to corroborate because the Samtools manual does not say anything about the meaning of this output code (in the manual entry for the stats command; I mean, something like explaining that 1 is sorted and 0 is not).

ADD REPLYlink 10 months ago
msimmer92
• 180
Entering edit mode
0

Samtools did check the order. In the source code:

if ( stats->tid==bam_line->core.tid && bam_line->core.pos<stats->pos )
stats->is_sorted = 0;

and

fprintf(to, "SN\tis sorted:\t%d\n", stats->is_sorted ? 1 : 0);

Link to source on github

ADD REPLYlink 10 months ago
ColorfulBlack
• 0
6
Entering edit mode

Hi,

I think if you run samtools index, it will throw an error if the file was not sorted before, can use return code e.g. in a script. Otherwise it would do no harm to run sort again, would it?

Edit: So there seems to be some confusion here about flags and samtools index, but the safest method to make sure the file is really sorted is to always run samtools sorton the bam file. This changes nothing if the file is already sorted. Also, I have never seen samtools index work with a non-sorted file, I am refering to 0.1.12a, exept in the case that the imported sam file was already sorted from the beginning, thus this method is safe, imho.

ADD COMMENTlink 9.1 years ago Michael Dondrup 46k
2
Entering edit mode

Unless you have extra information about the BAM file, it is not possible to tell without scanning the file. Extra information might be that you know that the optional sort order field in the header is present and correct, you know the complete history of the file or you have the expected checksum of the sorted data.

Otherwise, you can only be sure of the sort order by tracking the reference sequence and coordinates of the previous and current alignments while processing the file (in the case of coordinate sorting), or by pre-scanning with a program that performs this check. This is what 'samtools index' does while indexing.

It may be too expensive to pre-scan a huge BAM file to check for sortedness, so the cheapest way is to do it while you work on the data.

ADD COMMENTlink 9.1 years ago iw9oel_ad 6.0k
1
Entering edit mode

I ran into this problem recently, and no it looks like there is no way of telling.

The best I could come up with, was requiring sorted files to then be indexed to generate a FILE.bai file. Any BAM files that were missing this paired file were sorted and indexed as part of the pipeline...

ADD COMMENTlink 9.1 years ago Tim_Yates • 110

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0