Entering edit mode
6.6 years ago
chrisclarkson100
▴
150
I have a series of sam files that I would like to convert to bam files and merge into one bam file. The downstream analysis that is to be performed on the files requires them to have header lines.
example:
head -n 30 input1/input1.sam
@HD VN:1.0 SO:unsorted
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:bowtie2 PN:bowtie2 VN:2.2.6 CL:"/usr/local/bowtie2/bowtie2-align-s --wrapper basic-0 -x hg19 --very-fast -p 8 -S ChIPH1_dot_2_11-11-14_TTAGGC_L.sam -1 ChIPH1_dot_2_11-11-14_TTAGGC_L003_R1_complete_filtered.fastq -2 ChIPH1_dot_2_11-11-14_TTAGGC_L003_R2_complete_filtered.fastq"
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072 73 chr1 85460210 42 101M = 85460210 0 NGTTATAGAGATGGCTTCCTTTCTTAAAGCTCATGAACCAACCTCTGCTAGCTTGAAACTTTTCTTCTGCAGCTTCATTACCTCTCTCAGCCTTCACAGAA #1=DFDFFHGGGHIIJJIIIIJJJJHHCHHIEHGIFHIJJIJJJJJFIIFHIJJJIIGIIGIJHIJIJJJJIIIJHHHHHHHFFFFFBCCDEEDDCDCC?A AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0T100 YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072 133 chr1 85460210 0 * = 85460210 0 TGAATGATGATAAAACCAAAGAGGCCTATAGATGATATGGAGAAAGTTTTTGTGGTCTGGATAGAAGATCAAACCAGCACCAACATTCACTAAAGCTAAAG @@@DDEFFHGHHFIJJJJIJJIJJJIJIIHIHIJJJJIIHIJIJIJJJIJFHHIJIJJJJGHIIIGIGGIJJHHHHFFFFCE9ABDDDEDDDDCCCCDDDD YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1140:2073 99 chr20 43940927 40 101M = 43941039 215 NTGCGGCTGGTGCTGCGCGGGGGCCGGGAGCTGGGTACCTTCCACAGCCGCCTTATCAAGGTCATCTCGAAGCCCTCGCAGAAGAAGCAGTCGCTGAAAAN #1:BD7A@<?A;DFF<ECEEF>>?B6='65:<38?338A8ABBBBB<@?0<<55A(:(>+++3>:A>@2><89?BBBB@B5@?>?>33994>5>B<9AAA# AS:i:-2 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0C99A0 YS:i:-29 YT:Z:CP
however when I convert it to a bam file using the '-h' argument the header is not retained:
samtools view -h -S -b input1/input1.sam > test.bam
samtools view test.bam | head
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072 73 chr1 85460210 42 101M = 85460210 0 NGTTATAGAGATGGCTTCCTTTCTTAAAGCTCATGAACCAACCTCTGCTAGCTTGAAACTTTTCTTCTGCAGCTTCATTACCTCTCTCAGCCTTCACAGAA #1=DFDFFHGGGHIIJJIIIIJJJJHHCHHIEHGIFHIJJIJJJJJFIIFHIJJJIIGIIGIJHIJIJJJJIIIJHHHHHHHFFFFFBCCDEEDDCDCC?A AS:i:-1 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:0T100 YT:Z:UP
HWI-ST1437:123:C69MTACXX:3:1101:1405:2072 133 chr1 85460210 0* = 85460210 0 TGAATGATGATAAAACCAAAGAGGCCTATAGATGATATGGAGAAAGTTTTTGTGGTCTGGATAGAAGATCAAACCAGCACCAACATTCACTAAAGCTAAAG @@@DDEFFHGHHFIJJJJIJJIJJJIJIIHIHIJJJJIIHIJIJIJJJIJFHHIJIJJJJGHIIIGIGGIJJHHHHFFFFCE9ABDDDEDDDDCCCCDDDD YT:Z:UP
.....
Is there something that I'm missing???...
NOTE: after converted each to a bam file I will merge them all into one:
for i in input*
do
samtools view -h -S -b ${i}/*.sam > ${i}/${i}.bam
done
samtools merge input_formatted.bam -@ 6 input*/*sorted.bam
samtools sort -o input_formatted_sorted.bam -@ 6 input_formatted.bam
or -H for header only.