Variant calling step order question: base recalibration & mark duplicates, which is first?
2
0
Entering edit mode
6.7 years ago
alons ▴ 270

Hi all,

We're going through & revising our variant calling pipeline on NGS data from cancer patients and a question came up:

Which step should be done first (and why), base recalibration or mark duplicates?

Currently we recalibrate bases first and then mark duplicates.

The reason I'm asking this is that we originally based part of our pipeline on the following article, which said that you recalibrate bases and then mark duplicates: http://www.htslib.org/workflow/#mapping_to_variant

However, in the following Broad Institute best practices page it says the opposite, you mark duplicates and then recalibrate bases, saw it in another paper as well: https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS

Thanks in advance!

Alon

NGS variant calling cancer pipeline • 2.1k views
ADD COMMENT
0
Entering edit mode

As per GATK best practices workflow here, https://software.broadinstitute.org/gatk/img/BP_workflow_3.6.png, mark duplicates first, followed by base recalibration.

ADD REPLY
2
Entering edit mode
6.7 years ago
mforde84 ★ 1.4k

I'd probably remove duplicates first, since BSRC is generating some sort of covariation model with all of the supplied reads. I'm assuming that having a bunch of clonal artifacts in your dataset might throw this off a little. But honestly, you should ask the GATK people as they have a better understanding of the underlying model.

ADD COMMENT
2
Entering edit mode
6.7 years ago

Recalibrating bases should not really improve (or affect) duplicate detection. But duplicate removal can improve recalibration, so I'd do that first. And the earlier you remove duplicates, the faster everything else becomes.

ADD COMMENT

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6