Question: GATK quality recalibration to get recalibrated BAM files before SNP calling
I'm having problems with quality score recalibration using GATK as most tutorials/examples use an old version of GATK which has different syntax and arguments to the current version.

I have run this to get a recalibration table, which I think I need to get the recalibrated BAM files:

gatk- BaseRecalibrator \
-reference human_genome/hg38_masked.fa \
--known-sites human_genome/All_20180418.vcf.gz \
--input S01_BAM_file.bam \
--output recal.table

This works fine, and I could run this for all of my samples.

But when I then do the next step to get the recalibrated BAM files I can't work out what to do.

*The old way of doing it was:

 java –jar GenomeAnalysisTK.jar –T PrintReads \ 
–R human.fasta \
–I realigned.bam \ 
–BQSR recal.table \ 
–o recal.bam*

I wrote:

gatk- PrintReads \
-R data_R3/human_genome/hg38_masked.fa \
--input S01_BAM_file.bam \
-**BQSR** recal.table \
 --output quality_score_recalibrated_S01.bam

But this makes an error that says: "A USER ERROR has occurred: B is not a recognized option". And I can't find an alternative argument for -BQSR when I look at the PrintReads help manual.

Does anyone know the current way to run this quality score recalibration to get recalibrated BAM files?

8 months ago
you should use ApplyBQSR :

gatk BaseRecalibrator \
   -I input.bam \
   -R reference.fasta \
   --known-sites sites_of_variation.vcf \
   --known-sites another/optional/setOfSitesToMask.vcf \
   -O recal_data.table


 gatk ApplyBQSR \
   -R reference.fasta \
   -I input.bam \
   --bqsr-recal-file recal_data.table \
   -O output.bam
8 months ago

