Question

Order Of Gatk Commands

2

Entering edit mode

11.6 years ago

Ashutosh Pandey 12k

I have a mouse genome that was sequenced using 5 different mate-pair libraries and each library was run on 3 lanes on Illumina machine. I first aligned the reads at the lane level resulting into 15 bam files. Then I merged all the bam files (lanes) from the same library into a single BAM file resulting in 5 "single library BAM" files in total (each for one mate pair library). I want to use GATK to perform Indel realigner, Dedep and base score recalibration.

Assuming I have enough computational resources to run the GATK tool even on big bam files, what should be the correct order of performing these steps. I personally think, I should

1) Perform "IndelRealigner" at Library level OR for each "single library BAM" file separately. 2) Perform "Dedup" step at Library level to remove or mark redundant reads. 3) Using "TotalRecalibration" tool to perform quality score recalibration at single lane level or read group id level. GATK manual mentions that though a "single library BAM file" may contain reads from different read group or lanes, GATK will perform the recalibration at a lane level if RGID is provided in the BAM file for different lanes.

But I read a few recent papers, which have exactly the same situation as mine (1 sample -> multiple libraries -> each library run across more than one lane, No Barcoding) where IndelRealignment and was performed at lane level or single file, then Recalibration step was performed for each bam file separately and finally, lanes coming from the same library were merged together to form five "single library BAM file".

I just want to make sure if I am doing the things correct way?

Thanks.

bam gatk library • 4.7k views

ADD COMMENT • link updated 18 months ago by wahag65987 ▴ 10 • written 11.6 years ago by Ashutosh Pandey 12k

1

Entering edit mode

RE your point on computational resource for big bam files, if you do happen to have access to GPUs, Parabricks is worth a try for running GATK on GPU:

 $ docker run \
      --gpus all \
      --rm \
      --volume $(pwd):/workdir \
      --volume $(pwd):/outputdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
    pbrun haplotypecaller \
      --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
      --in-bam /workdir/fq2bam_output.bam \
      --out-variants /outputdir/variants.vcf

ADD REPLY • link 18 months ago by wahag65987 ▴ 10

score 5 · Answer 1 · 2012-09-24

5

Entering edit mode

11.6 years ago

Jorge Amigo 14k

I guess that the best practice would be to follow GATK's advice for best practices, wouldn't it?

I particularly use the "better" suggestion, since the merging step of the "best" suggestion has always given me problems due to internal sample labeling on SOLiD platforms. we would use it only on small targetted resequencing projects, but we've found out that all the steps suggested as "better" lead to fairly believable results.

ADD COMMENT • link 11.6 years ago by Jorge Amigo 14k

0

Entering edit mode

Yeah, I tend to go with the GATK's best practices as well, it is pretty straightforward and seems to work. I would use the better option but I often only have 1-3 exome samples per project and I've never been sure whether doing VQSR with samples from different projects (different diseases and families) is a good idea or not.

ADD REPLY • link 11.6 years ago by DG 7.3k

0

Entering edit mode

that's exactly the point I was trying to make. if you have mixed things it doesn't seem reasonable to treat them as a mixture. sure that if you work constantly with the same kits, reagents, sample types,... using the merging step of the best practices would be wise, but it is very rare the case that this happens on our lab... to date ;)

ADD REPLY • link 11.6 years ago by Jorge Amigo 14k

score 1 · Answer 2 · 2012-09-24

1

Entering edit mode

11.6 years ago

Zev.Kronenberg 12k

I don't know if I do it the "correct way", but here is my approach:

Align and de-dup separately.

sort and merge together with read groups.

Generate indel target intervals.

Run indel realignment.

ADD COMMENT • link 11.6 years ago by Zev.Kronenberg 12k