Sam file header and truncated file
1
0
Entering edit mode
7.5 years ago
TriS ★ 4.7k

I posted this on the samtools groups with no luck, so hopefully here i get better luck

I'm changing part of the header in a number of bam files. however, after I do that, the resulting bam file is truncated....

I used the following code borrowed/adapted from here

samtools view -H file.bam | grep  'SN:X\|^@RG\|^@PG\|^@HD' | samtools reheader - file.bam > bam_final/new_name.bam

and

samtools view -h bam_final/new_name.bam

gives me

[main_samview] truncated file.

right after the new header.... any suggestions?

sam samtools header • 3.5k views
ADD COMMENT
0
Entering edit mode

You are only doing a grep for the pattern but are not changing anything? Is that what you meant to do? The thread you linked uses sed to make changes and then writing them out using samtools reheader.

ADD REPLY
0
Entering edit mode

yes, I wanted to grep out only the X chr and the three headers indicated. I did use samtools reheader indeed

ADD REPLY
0
Entering edit mode

Not sure I understand what you want to do. Leave only the headers for X and remove others? If you only run the command up to the grep part, do you get what you expected to see?

ADD REPLY
0
Entering edit mode

that's correct and I do get the header I expect to see.

ADD REPLY
0
Entering edit mode

I just tried your command on a file (selecting for just chrX) with samtools v. 1.3.1 and was able to re-header the file without an error. What version of samtools are you using?

ADD REPLY
0
Entering edit mode

Version: 1.3-27-g6994f30 (using htslib 1.3-51-g55ff37c)

hmmm...I need to see if we have an earlier version on the server... so your bam file was not truncated, correct?

ADD REPLY
0
Entering edit mode

I do not get any error with samtools view -H and the size looks about the same after re-header.

ADD REPLY
0
Entering edit mode

have you tried to look at the first few lines? below the header like

samtools view -h mybam.bam | head

i still get the truncated file error

ADD REPLY
0
Entering edit mode
5.9 years ago

Bam format references to read groups by their respective number in the header. Therefore when you remove some read groups from the header, samtools gets confused. You should convert your bam to sam first, update the header, then convert back to sam. Just take all lines that contain your chromosome name (header+reads) along with the @HD and @PG lines in the header, then convert to sam:

samtools view $bamfile $chromosome | \
     grep -P "@HD|@PG|$chromosome" | \
     samtools view -Shb > $chromosome.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6