GATK recalibration, duplicates & realignment
17 months ago
alons

Hi all, I'm working on a variant calling pipeline based on the following link:

Now, in the "Improvement" section, which mainly uses GATK, it says that I should realign the bam file and then recalibrate and mark the duplicates. What is unclear to me is which bam file is used as input for each step. For example, is the realigned bam file the input of the recalibration step?
I should note that I'm using a single bam file, 1 library.

Thank you, Alon

15 months ago
National Institutes of Health, Bethesda…

You are correct in your reading. The steps described are run serially with the output BAM being the input BAM to the next step.

Thank you!.
A follow up question, though: in the same section they recommend another realignment, after the improvement steps (initial realignment, recalibration and marking of duplicates).
Is it really necessary if I have only one bam (no merging of several bam files) ?


