Question

Redundant @Sq Lines In Bam File

0

Entering edit mode

11.7 years ago

deepthithomaskannan ▴ 380

Hi all,

Anybody has the idea why redundant @SQ lines present in bam file header?

I created the bam file by the following procedure:

bowtie-build the genome
create sam file using samtools by aligning fastq files to the bowtie-build output
convert sam to bam using samtools

Those redundant files making error "Cannot add sequence that already exists in SAMSequenceDictionary" while I am trying to add Read Groups using picard-AddOrReplaceReadGroups.

Deeps

bowtie picard • 4.1k views

ADD COMMENT • link updated 10.3 years ago by Giovanni M Dall'Olio 28k • written 11.7 years ago by deepthithomaskannan ▴ 380

3

Entering edit mode

what you write does not quite makes sense, sam files are not created by samtools, and bowtie-build does not align data. Edit you your post and add the commands that you used and perhaps a sample of what you call redundant @SQ lines

ADD REPLY • link 11.7 years ago by Istvan Albert 100k

score 0 · Answer 1 · 2012-08-29

0

Entering edit mode

11.7 years ago

deepthithomaskannan ▴ 380

Hi Albert,

The editor deleted the new lines between the steps. I did not notice that.

The steps are:

Build the bowtie index using bowtie-build for genome.
create sam file using bowtie(not samtools) (by aligning fastq files to the bowtie-build output)
convert sam to bam using samtools

The generated bam files contains duplicate @SQ lines in the header. I think I got the reason . One file used to build the bowtie index is the subset of another file. ChrY.fa is a part of ChrU.fa.

COMMAND: bowtie-build Chr1.fa,Chr2.fa.....ChrY.fa genomeindexbasename

ADD COMMENT • link 11.7 years ago by deepthithomaskannan ▴ 380

0

Entering edit mode

ok, good thing that you have tracked that down - I think that would have been a bit difficult to troubleshoot for us

ADD REPLY • link 11.7 years ago by Istvan Albert 100k