Gatk Realignertargetcreator Sam/Bam Header Error
2
0
Entering edit mode
11.2 years ago
Houkto ▴ 220

Hi all, I get an GATK error

SAM/BAM file SAMFileReader{/merged_bam_markdup.bam} is malformed: Read HWI-ST303_0093:5:5:13416:34802#0
is either missing the read group or its read group is not defined in the BAM header, both of which are 
required by the GATK.

when I ran

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ref.fa -I 
merged_bam_markdup.bam -o read.intervals

My Bam file has a header

samtools view -h merged_bam_files_indexed_markduplicate.bam | grep ^@RG
@RG     ID:test1      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test   CN:japan
@RG     ID:test2      PL:Illumina     PU:HWI-ST303    LB:test     PI:75   SM:test    CN:japan

A grep of the read within the error

HWI-ST303_0093:5:5:13416:34802#0        99      1       1090    29      23S60M17S       =       
1150    160 
TGTTTGGGTTGAAGATTGATACTGGAAGAAGATTAGAATTGTAGAAAGGGGAAAACGATGTTAGAAAGTTAATACGGCTTACTCCAGATCCTTGGATCTC       
GGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGFGGGGGGGGGDGFGFGGGGGFEDFGEGGGDGEG?FGGDDGFFDGGEDDFFFFEDG?E
MD:Z:60 PG:Z:MarkDuplicates     RG:Z:test1      XG:i:0  AM:i:29 NM:i:0  SM:i:29 XM:i:0  XO:i:0  XT:A:M

Following GATK recommended solution using Picard

java -XX:MaxDirectMemorySize=4G -jar picard-tools-1.85/AddOrReplaceReadGroups.jar I= test.bam O=
test.header.bam SORT_ORDER=coordinate RGID=test RGLB=test  RGPL=Illumina RGSM=test/ RGPU=HWI-ST303
RGCN=japan CREATE_INDEX=True

Got this error

Exception in thread "main" net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Record
 12247781, Read name HWI-ST303_0093:5:26:10129:50409#0, MAPQ should be 0 for unmapped read.

I also tried

java -jar GenomeAnalysisTK-2.3-9-ge5ebf34/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fa
 -I merged_bam_files_indexed_markduplicate.bam -o reads.intervals --validation_strictness LENIENT

but I got the first error message

Any suggestions would be great ?

N.B: I used to ran the same command (first one) with GATK version 1.2 and did not result in error

I also tried

picard/MarkDuplicates.jar I=test.bam O=test_markduplicate.bam M=test.matrix AS=true
VALIDATION_STRINGENCY=LENIANT

then indexed with samtools,then I got the following error

Ignoring SAM validation error: ERROR: Record (number), Read name HWI-ST303_0093:5:5:13416:34802#0,
RG ID on SAMRecord not found in header: test1

My pipeline

bwa aln -q 20 ref.fa read > files.sai
bwa sampe ref.fa file1.sai file2.sai read1 read2 > test1.sam
samtools view -bS test1.sam | samtools sort - test
samtools  index test1.bam
samtools merge -rh RG.txt test test1.bam test2.bam
gatk bam sam • 6.9k views
ADD COMMENT
1
Entering edit mode
11.2 years ago

I dont use samtools for merging. I use Picard for that. In your pipeline can you try using Picard for the last step (merging).

ADD COMMENT
0
Entering edit mode

thanks ashutoshmits, will that fix the problem ?

ADD REPLY
0
Entering edit mode

Use Picard for merging. Try to use Picard for all BAM operations, it does a better job than samtools with the headers / metadata and produces BAM files that are more compliant with GATK. Picard and GATK are both developed maintained at the Broad Institute.

ADD REPLY
0
Entering edit mode

I am not sure about it but you can give it a try. As William mentioned that Picard does a better job then samtools for merging the bam files.

ADD REPLY
0
Entering edit mode
11.2 years ago
samsara ▴ 630

When you merge BAM files you should group the reads (for GATK pipeline). The error is saying RG ID is not found. You can group reads using picard's AddReplaceReadGroups. First add read groups (RG) for both BAM/SAM files and then merge them.

ADD COMMENT
1
Entering edit mode

I did that already, and I got an error about MAPQ should be 0. I think the problem is not with the header but with bwa. I am still waiting for a solution.

ADD REPLY
0
Entering edit mode

It seems that the problem is with the aligner. Have you tried Bowtie instead ? Following links could be helpful http://seqanswers.com/forums/showthread.php?t=4246 Final Solution For "Mapq Should Be 0 For Unmapped Read."

ADD REPLY
0
Entering edit mode

World, I am thinking about it, however, it is kinda hard now. Since I worked on the same data using earlier version of GATK 1.2 and no error was reported. But these errors with the new version of GATK are known because its something that BWA does but they are not suppressed with VALIDATION_STRINGENCY=LENIANT.

ADD REPLY
0
Entering edit mode

You can use the --rg-id <text> --rg <text> in Bowtie2 if you are willing to realign the data. </text></text>

ADD REPLY
0
Entering edit mode

Thanks Zev, I looked at bowtie2 manual and I think I will go with the default settings (--sensitive) but if you can direct me to a pipeline of bowtie2 it would be great.

ADD REPLY

Login before adding your answer.

Traffic: 2950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6