Some questions about call variants for mtDNA using GATK HaplotypeCaller
1
1
Entering edit mode
5.6 years ago
MatthewP ★ 1.4k

Hi everyone. I tried to apply GATK pipeline to amplicon-based mtDNA sequencing data analyse and found some problems want to ask/discuss with you guys.
I sat up my pipeline according this GATK Best Practice. Here is my pipeline:

  1. QC(fastp)
  2. mapping
    bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz > sample.sam
  3. SAM to BAM(samtools)
    samtools view -bF 12 sample.sam > sample.bam
  4. sort BAM(GATK SortSam)
    gatk SortSam -I sample.bam -O sample_sorted.bam -SO coordinate
  5. BAM index(samtools index)
  6. Validate BAM(GATK ValidateSamFile)
    gatk ValidateSamFile -I sample_sorted.bam -M VERBOSE
  7. Call variant(GATK HaplotypeCaller)
    gatk HaplotypeCaller -I samplt_sorted.bam -O sample.g.vcf -R hs_ref_GRCh38.p12_chrMT.fa -ploidy 1 -ERC GVCF
  8. Consolidate GVCFs(GATK GenomicsDBImport)
    gatk GenomicsDBImport --genomicsdb-workspace-path ./workspace -L MT -V sample1.g.vcf -V sample2.g.vcf ...(many samples)
  9. Joint-Call Cohort(GATK GenotypeGVCFs)
    gatk GenotypeGVCFs -O 18R111.vcf -R hs_ref_GRCh38.p12_chrMT.fa -ploidy 1 -V gendb://workspace

Here is something I want to explain first

  • No mark/remove duplicates because this is amplicon-based(multi-PCR) sequencing data, most of them are duplicates normally(fastp reported about 96%).
  • I set -ploidy 1 in step 7 and 9 because mtDNA should treated as haploid. However I can't set step 8 to haploid because this only for diploid. Second, I find description below in this GATK article: > The one quantity that changes whether the GATK considers the possibility of a heterozygous genotype at all is the ploidy, which describes how many copies of each chromosome each individual in the species carries.

It seems this ploidy setting only works for nuclear genome, means the copies of chromosome. But this not quite right for mtDNA for copy number not sure(At least not 1 copy). Question1: What will this affect my result?

When I check my finally vcf result I find 2 problems, result example:

MT      214     .       A       G       2386.46 ... GT:AD:DP:GQ:PL  1:0,112:112:99:5445,0   0:136,0:136:99:0,1080...

There problem is AD and DP too low value(Important to me, I need to calculate each variant's frequency.). The real depth value is about 5000, I find many has the same issue(check here). This seems because HaplotypeCaller not suite for amplicon data. Question2: Have anyone call such mtDNA variant using GATK? How?

Question3: Will Mutect2 work here?
I think about this because mtDNA heterogeneity are like tumor, while tumor has many other features that mtDNA don't like structure variants. Any one think this before? What should be normal tissue data (-I:normal param in Mutect2) if I apply Mutect2 to mtDNA? Reference sequence?

Thanks everyone!

GATK HaplotypeCaller mtDNA • 2.9k views
ADD COMMENT
1
Entering edit mode

2 . mapping
bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz > sample.sam

3 . SAM to BAM(samtools)
samtools view -bF 12 sample.sam > sample.bam

4 . sort BAM(GATK SortSam)
gatk SortSam -I sample.bam -O sample_sorted.bam -SO coordinate

Note that you shorten steps 2-4, and avoid intermediate files by piping the result of the alignment directly to samtools sort, creating a sorted bam file:

bwa mem -t 4 -R "@RG\tID..." hs_ref_GRCh38.p12_chrMT.fa sample.R1.fastq.gz sample.R2.fastq.gz  | samtools sort -o sample.bam -
ADD REPLY
0
Entering edit mode

Thanks, I used samtools sort before until I found gatk SortSam. I though this two tools do exactly the same thing. You mean my understanding here is wrong?

ADD REPLY
0
Entering edit mode

They probably do exactly the same thing, but I'd naively expect samtools to be quicker.

What you are doing is correct, there is just no need to split the alignment, sam to bam and sorting bam across three commands. It's messy, takes longer and makes intermediate files which you'll have to delete. A sam file takes a ton of disk space.

ADD REPLY
0
Entering edit mode

Good advice, indeed I need to delete those files by hand. Thanks.

ADD REPLY
0
Entering edit mode
24 months ago
kanika.151 ▴ 130

I just wanted to know if you got answers to your questions.

As Mutect2 will restrict results to low freq variant calls. :-)

ADD COMMENT

Login before adding your answer.

Traffic: 1969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6