Biostar Beta. Not for public use.
Question: Is there a easy to use GATK pipeline for SNP calling?
3
Entering edit mode

With full respect, GATK is a good tool for SNP calling. But the tutorial on GATK website is too complex, I get lost in the details.

Is there an easy to use a list of GATK commands for SNP calling? That I can copy and paste, with changing of just input file names, and maybe few parameters?

ADD COMMENTlink 23 months ago Chen • 880 • updated 10 months ago johannes.koester • 10
Entering edit mode
0

Hello,

what's the problem with the Tool Documentation?

I guess you tried to go through the best practice guide and get lost somewhere there? For the beginning it's ok to start with just the command for a VariantCall using HaplotypeCaller. But I would recommend to read more about the the whole "pipeline thing" (Not only the best practice guide, but that's a good starting pointing). Depending on what you try to analyse, there is much more to do than just hack in the command for a VariantCall.

Please feel free to ask a specific question if you don't understand a certain point.

fin swimmer

ADD REPLYlink 23 months ago
finswimmer
11k
Entering edit mode
0

Hi swimmer, thank you very much for the suggestions. Do you think https://gencore.bio.nyu.edu/variant-calling-pipeline/ is a good command pipeline that I can follow? This is the kind of pipeline I am looking for, but I am not sure if they miss something important.

ADD REPLYlink 23 months ago
Chen
• 880
Entering edit mode
3

That pipeline might be a bit old. I don't think you need to do the realignment target creater/realign for indel anymore as haplotypeCaller will do that now.

The general steps for me are:

  1. trim reads
  2. bwa mem align to genome
  3. mark duplicates
  4. use HaplotypeCaller to generate gvcf
  5. CombineGVCFs
  6. GenotypeGVCFs on the combined gvcf
  7. filter your vcf however you want
  8. You can do base recalibration iteratively now if you want with the filtered vcf.

And yes, their tutorials are a bit of a mess. Their best practice guide is organized badly. You have to dig around alot.

ADD REPLYlink 23 months ago
Damian Kao
15k
Entering edit mode
0

Hi, Damian! I like the steps you mentioned a lot - I've just looked for something like this. Under "mark duplicates" did you mean to mark the duplicated reads using MarkDuplicates (Picard)? And should be duplicated/recombinant regions be removed from the reference as well, or it happens naturally when MarkDuplicates work?

ADD REPLYlink 18 months ago
lutra007
• 0
Entering edit mode
1

Yes, I usually just use picardtools' MarkDuplicates. Duplicate/recombinant regions are tricky to deal with. It might be better to do some kind of de novo assembly of those regions specifically if that's what you want to study.

ADD REPLYlink 18 months ago
Damian Kao
15k
Entering edit mode
0

Dear Damian~ I'm trying to understand best practices for variant calling. Following alignment and marking duplicates, does each individual need to have variants called before calling variants across multiple samples (step 6)?

ADD REPLYlink 18 months ago
emilyepuckett
• 0
Entering edit mode
0

GATK best practices suggest creating a genome VCF (g.vcf) for each individual, combining the g.vcfs and then doing a joint-calling. This is step 4,5,6 in my comment.

A genome VCF is different from a normal VCF in that it will also output information on positions that are not different from the reference. You want this information when you eventually do a joint-calling among all samples so you can make the comparison with other samples where there is a difference to reference at that position. I would read up on g.vcfs if you want more info.

ADD REPLYlink 18 months ago
Damian Kao
15k
Entering edit mode
1

Hi Chen, the pipeline that you mentioned by NYU seems fine. It appears to be mostly for internal use, though. You don't appear to be based at NYU...?

ADD REPLYlink 17 months ago
Kevin Blighe
43k
Entering edit mode
0

I agree - It would be awesome if GATK could be used through a front-end application or were more user-friendly!

ADD REPLYlink 21 months ago
gaelgarcia
• 140
0
Entering edit mode

We have wrote and provide an open source software to do exactly what you want. You can find it here:

https://github.com/frankMusacchia/VarGenius

But you can run it only into a cluster

Regards

ADD COMMENTlink 13 months ago francescomusacchia • 60
0
Entering edit mode

There is an easy to use reproducible Snakemake workflow: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling

ADD COMMENTlink 10 months ago johannes.koester • 10

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0