Biostar Beta. Not for public use.
Malformed BAM file produced by BWA
0
Entering edit mode
3.0 years ago
joreamayarom • 110
USA/Cambridge

I'm aligning reads to a genome to later do some variant calling. I have mapped some reads to a genome using bwa and created the corresponding dictionary

bwa index -p ref.fa ref.fa
bwa mem ref.fa forward.fastq.gz reverse.fastq.gz -t 16 > mapped_file.sam
samtools view -bS mapped_file.sam -o mapped_file.bam
java -jar picard.jar CreateSequenceDictionary \
R=ref.fa \
O=ref.dict

samtools faidx ref.fa

And then proceeded to sort and remove duplicates

java -jar picard.jar SortSam \
I=mapped_file.bam \
O=sorted_file.bam \
SORT_ORDER=coordinate

java -jar picard.jar MarkDuplicates \
REMOVE_DUPLICATES=true \
ASSUME_SORT_ORDER=coordinate \
I=sorted_file.bam \
O=clean_file.bam \
M=metric_file.txt

And finally tried to realign the sequences using GATK, but the programmed complained that my input bam file is malformed.

java -jar GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R ref.fa \
    -I clean_file.bam \
    -o clean_file.intervals

I ran ValidateSamFile, as GATK recommends, and got this error

## HISTOGRAM    java.lang.String 
Error Type  Count 
ERROR:MISSING_READ_GROUP    1
WARNING:RECORD_MISSING_READ_GROUP   324404

To solve this problem I tried running AddOrReplaceReadGroups and FixMateInformation on my clean_file.bam, as the GATK webpage recommends (http://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile), to no avail. I also ran ValidateSamFile directly in my Bra output and got a similar error message. Is there an explanation for this problem? Has my BWA output been corrupted in any stage?

ADD COMMENTlink
0
Entering edit mode

The RG tag has to be added manually, so it's not really a "corrupt" BAM... you just haven't added RG metadata and GATK needs that. AddOrReplaceReadGroups is the answer. Why didn't it, er, "avail"? :)

ADD REPLYlink
2
Entering edit mode
13 months ago
France/Nantes/Institut du Thorax - INSE…

1) one step (don't save as SAM but bam, add your read groups and sort)

bwa mem -R '@RG\tID:S1\tSM:SAMPLE1'  ref.fa forward.fastq.gz reverse.fastq.gz -t 16 |  samtools sort -O BAM -o out.bam -T tmp -

2) when using the picard tools use the option

VALIDATION_STRINGENCY=LENIENT
ADD COMMENTlink
1
Entering edit mode

I'm checking GATK documentation right now. It is mentioned that the -M flag is essential for Picard compatibility. Is that missing from your answer or you have an specific reason to leave it out?

ADD REPLYlink
0
Entering edit mode

GATK/-M option ? what are you talking about ? VALIDATION_STRINGENCY is a general option for picard when handling bam.

ADD REPLYlink
0
Entering edit mode

I'm talking about recommendation 1. In GATK webpage (http://gatkforums.broadinstitute.org/gatk/discussion/2799) they recommend the following command with an -M before -R.

bwa mem -M -R ’<read group info>’ -p reference.fa raw_reads.fq > aligned_reads.sam

ADD REPLYlink
0
Entering edit mode

1) the option -M has nothing to do with your problem. Check the documentation

2) the order of the options doesn't matter.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1