Convert an altered sam file into a bam file
0
2
Entering edit mode
6.4 years ago
Jautis ▴ 530

Hi, I have a sam file from BSMap which I have modified into the format of a sam file from bismark.

However, after doing so, I'm no longer able to convert the sam file into a bam file using samtools. What is it that I'm missing? The more general version of this question would be when are you able to use samtools to convert sam-to-bam and what sam formats are acceptable

Thanks in advance!

Code

#reorder columns
awk '{print $1 "\t" $13 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 "\t" $8 "\t" $9 "\t" $10 "\t" $11 "\t" $12}' SAM > SAM2
#reattach header (not-modified)
cat sam_head SAM2 > temp; mv temp SAM2
#attempt to convert file
samtools view -bS SAM2 > BAM

Error Message: (line 25 is the first read after the header)

[E::sam_parse1] unrecognized type
[W::sam_read1] parse error at line 25
[main_samview] truncated file.

Bam File, First 25 lines.

@HD     VN:1.0
@SQ     SN:chr4 LN:165299245
@SQ     SN:chrX LN:143131424
@SQ     SN:chr2 LN:187378091
@SQ     SN:chr6 LN:174439528
@SQ     SN:chr8 LN:139646187
@SQ     SN:chr12        LN:104110932
@SQ     SN:chr10        LN:90941950
@SQ     SN:chr14        LN:123829720
@SQ     SN:chr16        LN:74645514
@SQ     SN:chr18        LN:72186199
@SQ     SN:chr20        LN:71807805
@SQ     SN:chr1 LN:220367699
@SQ     SN:chr3 LN:180432695
@SQ     SN:chr5 LN:178775436
@SQ     SN:chr7 LN:162156779
@SQ     SN:chr9 LN:125196307
@SQ     SN:chr11        LN:132286798
@SQ     SN:chr13        LN:128036923
@SQ     SN:chr15        LN:107442819
@SQ     SN:chr17        LN:90913898
@SQ     SN:chr19        LN:51301725
@SQ     SN:mtDNA        LN:16566
@PG     ID:BSMAP        VN:2.90 CL:"bsmap -3 -n 1 -v 0.1 -r 0 -a ./tomap.fq.gz -d /file.sam"
1_7163:15-114   16      chr18   70657804        255     100M    *       0       0       ATAAATTATTATATTAATGTAAAAGTAGTAAATATTTTTGTGGTGTAGTTTGCGTGTTTGGTTTTTTTTATTATTTATTTGTGAGACGTTGATTTTCGTT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    NM:i:0  MD:Z:0G1G1G4A9G13G3G0G1A2G2G2G39G2      XM:Z:h.h.h..............x....H........x...xh....x.Zx..x........Z...............................x..      XR:Z:CT  XG:Z:GA
sam bam sequencing • 3.7k views
ADD COMMENT
1
Entering edit mode

Speaking of this problem in particular: we'd need a print out of line 25 to understand what's wrong with it.

More generally speaking:

A SAM file (to be called as such) requires a formatted header and a series of records which have the columns defined in the file format definiton (link). If you want to be able to convert a sam to a bam, you need your file to possess these two elements. It doesn't matter if the header contains more scaffolds than the ones represented in the records, what matters is that the opposite doesn't happen: records point at chromosomes / scaffolds that are not in the header. You'll see everything in chapter 1.3 of the linked PDF.

In your case I see you're attaching the header so: are you keeping all the header when you generate it? Are your record lines all containing the same number of fields?

ADD REPLY
0
Entering edit mode

Thanks! I went ahead and added the first 25 lines to the initial question. Yes, I am keeping the same header that I initially generated. I am adding additional fields (NM, MD, XM, XR, and XG flags with dummy values)

ADD REPLY
0
Entering edit mode

Does it print the same error if you exclude the XM tag field at the end? And if you change the read name? Maybe there are some meta-characters... Also, check if you have whitespaces!

ADD REPLY
0
Entering edit mode

Please post first few starting lines and end lines of your samfile so as to get clear idea of what has gone wrong.

ADD REPLY

Login before adding your answer.

Traffic: 3040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6