Hi all, I am trying to take a bam file, convert it to cram, and then back again to bam and have the bam before and after conversion to be identical. Using samtools for the conversions:
samtools view -C -T ref.fa seqs.bam > seqs.cram
samtools view -b -T ref.fa seqs.cram > seqs.bam
When analysing the bam before and after, there are differences. The headers differ (md5 etc), which are of no concern to me, but the records actually change. Here I have isolated the differences for a record in the before and after bam files:
MD:Z:19 NM:i:0 (before)
MD:Z:18N0 NM:i:1 (after)
Is anybody familiar with this stuff? is it possible to have the bam file after conversion to cram and back to bam identical to the bam file in the beginning?
Cheers,
Can you post the whole read for before and after cases. I did the same conversion on my data, and although the order of tags are different, but the information looks to be the same
Most records are the same. Here is the whole record from where I drew the example in the original question. before:
after:
The reported reads have same MD and NM (and other tags), before and after! So which reads have problems?
Pasted the wrong one before. This one has different NM and MD
after:
Interesting. If you make a BAM file with just that read and do the BAM -> CRAM ->BAM conversion again can you see if this still happens? If so, please post this as an issue on the samtools (or htslib) github repositories. You can then attach the BAM file.
Are you sure you are not doing some other processing of the "earlier" bam before converting to cram?
I did indeed use samtools 1.4.