samtools merge warning
1
0
Entering edit mode
7.0 years ago
lshepard ▴ 470

Hi,

I have a question regarding a warning that I am observing while merging bam files with samtools.

For simplicity here is the command I used for a single pair of files:

samtools merge merged.bam File1-sorted.bam File2-sorted.bam

Note that I simply added the 'sorted' on input files just to clarify that these files were previously sorted with samtools.

After running 'merge', the following warning is issued:

'Order of targets in file File2.bam caused coordinate sort to be lost'

Unfortunately, I am having a hard time finding more details about what this means, and how to avoid. I would appreciate any information. Thanks!

next-gen • 6.0k views
ADD COMMENT
0
Entering edit mode

what is the output of :

samtools view -H File2-sorted.bam | head
ADD REPLY
0
Entering edit mode

Hi Pierre,

The output from your command is:

@HD     VN:1.0  SO:coordinate
@SQ     SN:chr1 LN:290094216
@SQ     SN:chr10        LN:112200500
@SQ     SN:chr10_AABR06110104_random    LN:1013
@SQ     SN:chr10_JH620367_random        LN:1765
@SQ     SN:chr10_AABR06110107_random    LN:780
@SQ     SN:chr10_AABR06110108_random    LN:4563
@SQ     SN:chr10_AABR06110109_random    LN:2250
@SQ     SN:chr10_AABR06110110_random    LN:2082
@SQ     SN:chr10_AABR06110111_random    LN:2352

Please, let me know if you need anything else. Thanks!

ADD REPLY
1
Entering edit mode

SO:coordinate in first line of your BAM says that your file is coordinate-sorted (SO = Sort Order). The merging of two files will destroy this sorting, that's why samtools generates a warning. Don't worry, just re-sort the merged BAM and it should be fine.

ADD REPLY
0
Entering edit mode

Hi Santosh, thanks for the clarification, I will re-sort the file. But does that mean that sorting before merging is not necessary? I always though sorting before merging was for the better, but this suggest that there should be two merging steps for my files?

Thanks again!

ADD REPLY
3
Entering edit mode

Actually, sorting is required for merging. The point of merging is not just concatenating the two files, but to also preserve the sort and create a well-formatted header. From samtools manual: http://www.htslib.org/doc/samtools.html

Merge multiple sorted alignment files, producing a single sorted output file that contains all the input records and maintains the existing sort order.

If -h is specified the @SQ headers of input files will be merged into the specified header, otherwise they will be merged into a composite header created from the input headers. If in the process of merging @SQ lines for coordinate sorted input files, a conflict arises as to the order (for example input1.bam has @SQ for a,b,c and input2.bam has b,a,c) then the resulting output file will need to be re-sorted back into coordinate order.

ADD REPLY
0
Entering edit mode

That last sentence is the important one here.

ADD REPLY
0
Entering edit mode

absolutely! that explains it all

ADD REPLY
0
Entering edit mode

Perfect, that is what I thought, but wanted to confirm and make sure i didn't misunderstood.

ADD REPLY
2
Entering edit mode
7.0 years ago

The issue is that your headers are in different orders or one file has chromosomes/contigs that the other doesn't. Consequently, while the input files might be nicely sorted, it's not immediately clear that the output will be properly sorted. As Santosh mentioned, you can just resort the merged file to fix this.

The bigger question is really how this happened to begin with. I presume you downloaded one of the files or that they in some way came from different sources. If you made both of these yourself, then either you used two different indices, or aligners that spit things out in different orders (that's not good) or something along those lines. If this is the case then it should be fixed because it'll cause you untold problems that you don't even know about yet.

ADD COMMENT
0
Entering edit mode

Thanks for this clarification! Also I was wondering why the sort order is destroyed, though I guessed it based on the error message :)

ADD REPLY
0
Entering edit mode

Hi Devon, thanks for the input. I will re-sort the files as suggested. Now, as to your question about how this happened in the first place: these files are the output from an Ion Torrent sequencing run, and unlike other platforms, the alignment suggested is a 'two step alignment' where you first perform an alignment with TopHat2 (may also use STAR) and use the unaligned reads to align with only Bowtie2 (using soft clipping local mode).

The index used was the same, I just noticed that the Bowtie2 output was unsorted (TopHat2 already sorted by coordinate), so I sorted all the files before merging. So I am assuming using the two step alignment might be the reason, but I am not sure how it may be avoided for this particular NGS platform. If you know anything, I would appreciate any info! :) Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 1505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6