GATK, SAM file doesn't have any read groups defined in the header
3
6
Entering edit mode
9.5 years ago
Chirag Nepal ★ 2.4k

Hi all,

I have been trying to use Mutect to compare results from Varscan and other tools. To run MuTect, pre-processing from GATK and Picard tools is necessary.

1. Mapped reads using BWA.

2. Convert to sorted BAM using PICARD

java -Xmx4g \
  -Djava.io.tmpdir=/tmp \
  -jar SortSam.jar \
  SO=coordinate \
  INPUT=Trimmed_ERR361938_trimmed_bwa.sam \
  OUTPUT=Test.bam \
  VALIDATION_STRINGENCY=LENIENT \
  CREATE_INDEX=true

3. Mark Duplicates using PICARD

java -Xmx4g \
  -Djava.io.tmpdir=/tmp \
  -jarpicard-tools-1.119/SortSam.jar \
  SO=coordinate \
  INPUT=Trimmed_ERR361938_trimmed_bwa.sam \
  OUTPUT=Test.bam \
  VALIDATION_STRINGENCY=LENIENT \
  CREATE_INDEX=true

4. Realign along INDEL using GATK

java -Xmx4g \
  -jar GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -R /steno-internal/chirag/data/indexGenome/hg19/bwa/hg19.fa \
  -o input.bam.list \
  -I input.marked.bam

NOW I GET ERROR

##### ERROR
##### ERROR MESSAGE: SAM/BAM file input.marked.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
##### ERROR

There is this script which should fix this, but I am not sure of some of the parameter used here,

java -jar ~/unixTools/picard-tools-1.119/AddOrReplaceReadGroups.jar

These parameters need to be used

  • RGLB=String
  • LB=String Read Group Library Required.
  • RGPU=String
  • PU=String Read Group platform unit (eg. run barcode) Required.
  • RGSM=String
  • SM=String Read Group sample name Required.

How do I get information on these parameters, as I am analyzing many published reads.

Are there some other ways to fix this step.

Thanks in advance!

SAM GATK • 27k views
ADD COMMENT
2
Entering edit mode

From my experience, you'll face three obstacles in succession as you embark on this journey (between steps 3 and 4 from your question). Possible tools you might need as you go:

  1. Picard's AddOrReplaceReadGroups to add read group info to the BAM file
  2. Picard's NormalizeFasta to normalize the fasta ref file
  3. Picard's CreateSequenceDictionary to create a .dict for the fasta ref
ADD REPLY
1
Entering edit mode

Thanks !

I am following these steps. Number-3 is fine, have not yet encountered step-2. Step-1 is where i am having problem.

ADD REPLY
1
Entering edit mode

FYI (In case you have not seen it before),

http://gatkforums.broadinstitute.org/discussion/3059/lane-library-sample-and-cohort-what-do-they-mean-and-why-are-they-important

PS: Huh , just realized that its an old post :-)

ADD REPLY
8
Entering edit mode
9.5 years ago

you have to specify the read group from the beginning using the option -R of bwa

-R STR Complete read group header line. '\t' can be used in STR and will be converted to a TAB in the output SAM. The read group ID will be attached to every read in the output. An example is '@RG\tID:foo\tSM:bar'. [null]

you can add a group to your current bams using picard AddOrReplaceReadGroups http://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups

ADD COMMENT
0
Entering edit mode

For first option: So, one option is re-mapping again by adding the command: bwa sampe -r STR

For second option: using AddOrReplaceReadGroups, there are 4 parameters that need to be filled.

  1. LB=String Read Group Library Required.
  2. PU=String Read Group platform unit (eg. run barcode) Required.
  3. SM=String Read Group sample name Required.

Can you suggest what parameter should I use, simply to bypass the error

ADD REPLY
3
Entering edit mode

I'd go with

RGLB=LaneX RGPU=NONE RGSM=AnySampleName
ADD REPLY
0
Entering edit mode

Thanks RamRS !

Will try this.

ADD REPLY
0
Entering edit mode

HI @RamRS

Now I started picard/GATK right from sam file.

#! /bin/bash

hg19Seq=/steno-internal/chirag/data/indexGenome/hg19/bwa/hg19.fa

echo "Step-1"
#java -Xmx4g -Djava.io.tmpdir=/tmp -jar /usr/local/lib/picard-tools-1.85/SortSam.jar SO=coordinate INPUT=Trimmed_ERR361938_trimmed_bwa.sam OUTPUT=Test.bam VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

echo "Step-2"
#java -jar /usr/local/lib/picard-tools-1.85/AddOrReplaceReadGroups.jar I=Test.bam O=Test_ADDOrReplace.bam RGPL=illumina RGLB=LaneX RGPU=NONE RGSM=AnySampleName

echo "Step-3"
java -Xmx4g -Djava.io.tmpdir=/tmp -jar /usr/local/lib/picard-tools-1.85/MarkDuplicates.jar INPUT=Test_ADDOrReplace.bam OUTPUT=Test_ADDOrReplace_marked.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

echo "Step-4"
echo "make list for realignment"
#java -Xmx4g -jar ~/unixTools/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -T RealignerTargetCreator -R $hg19Seq -I Test_ADDOrReplace_marked.bam -o input.bam.list

echo "Step-5"
echo "realign along INDEL"
#java -Xmx4g -Djava.io.tmpdir=/tmp -jar ~/unixTools/GenomeAnalysisTK-3.2-2/GenomeAnalysisTK.jar -R $hg19Seq -T IndelRealigner -targetIntervals input.bam.list -I Test_ADDOrReplace_marked.bam -o Test_ADDOrReplace_marked_realigned.bam

echo "Step-5"
#java -Djava.io.tmpdir=/tmp/flx-auswerter -jar ~/unixTools/picard-tools-1.119/FixMateInformation.jar INPUT=input.marked.realigned.bam OUTPUT=input_bam.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

Get error in Step-3:-

[Fri Oct 17 13:50:53 CEST 2014] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 5.31 minutes.
Runtime.totalMemory()=2648702976
FAQ:  http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" net.sf.samtools.util.RuntimeEOFException: Premature EOF; BinaryCodec in readmode; file: /NextGenSeqData/project-data/chirag/rawReadsFastQ/chanOn_et_al_2013_liverFluke/Test_ADDOrReplace.bam
    at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:373)
    at net.sf.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:357)
    at net.sf.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:200)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:558)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:532)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:522)
    at net.sf.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:481)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:386)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:150)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:134)

Any suggestion on how i could solve this.

ADD REPLY
1
Entering edit mode

Hi,

Looks like your BAM file might be truncated. You might wanna check the Test_ADDOrReplace.bam file using Picard's ValidateSamFile

Manual: http://broadinstitute.github.io/picard/command-line-overview.html#ValidateSamFile

EDIT: I'd suggest checking the Test_ADDOrReplace.bam and if you find it truncated, start validating each file in the pipeline, starting from the raw BAM file you got from bwa.

ADD REPLY
0
Entering edit mode

@Pierre:

I tried this:

bwa sampe -r STR $hg19Index ${name%1.fastq}1.sai ${name%1.fastq}2.sai $name ${name%1.fastq}2.fastq > ${name%1.fastq}trimmed_bwa.sam

and get error

[bwa_sai2sam_pe] malformated @RG line
ADD REPLY
7
Entering edit mode
9.5 years ago
Ram 43k

Hi,

SAM/BAM files need a bit of preprocessing before Picard and GATK can work on them. I'm unable to recall all the steps off the top of my head, but this should help you solve the first problem by adding read groups:

Adding Read Groups To Bam Files

Remember, dummy read group names will suffice to bypass this error.

ADD COMMENT
2
Entering edit mode
8.8 years ago
ido.sloma ▴ 20

replace STR with @RG\tID:foo\tSM:bar

ADD COMMENT
0
Entering edit mode

This is a 3 year old solved question :)

ADD REPLY
1
Entering edit mode

But, still, thanks to @ido.sloma for sharing the answer.

ADD REPLY
0
Entering edit mode

Agreed. Just letting them know. Looking at the context now, it probably should be a reply to your comment.

ADD REPLY

Login before adding your answer.

Traffic: 2118 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6