Question

how to go from aligning forward when analyzing whole exome sequencing

0

Entering edit mode

5.2 years ago

Learner ▴ 280

I have read so many post on this website but most of them are old , I used this post to build up my pipeline, however, it is very old now (What Is The Best Pipeline For Human Whole Exome Sequencing? ).

Is there anyone who could give me a better update on Genome Analysis Toolkit I cannot find the best way to do the steps 6 to 11 of that post and also gatk changed as mentioned below

https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.3.0/

genomics • 1.4k views

ADD COMMENT • link updated 5.2 years ago by manuel.belmadani ★ 1.3k • written 5.2 years ago by Learner ▴ 280

1

Entering edit mode

Did you check v4 best practices? What specifically is troubling you?

https://software.broadinstitute.org/gatk/best-practices/workflow

ADD REPLY • link 5.2 years ago by Santosh Anand 5.7k

0

Entering edit mode

Probably the "Exome" part, the Broad guides I think assume you have WGS.

ADD REPLY • link 5.2 years ago by manuel.belmadani ★ 1.3k

0

Entering edit mode

@manuel.belmadani WGS is different than WES. but the process should be rather similar

ADD REPLY • link 5.2 years ago by Learner ▴ 280

0

Entering edit mode

Yes that's why I was suggesting that using the Broad best practices might not be completely appropriate. I'd be concerned that the base/variant recalibration steps would differ for WES. It's been asked in their forum and there's some answer about specific steps but no comprehensive guide for WES afaik.

ADD REPLY • link 5.2 years ago by manuel.belmadani ★ 1.3k

0

Entering edit mode

@manuel.belmadani look at this one, outdated https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_bqsr_BaseRecalibrator.php

ADD REPLY • link 5.2 years ago by Learner ▴ 280

0

Entering edit mode

Yeah, I find it's difficult to adjust the individual steps if you're not following an entire guide. See my answer about using the ExAC pipeline instead.

ADD REPLY • link 5.2 years ago by manuel.belmadani ★ 1.3k

0

Entering edit mode

@Santosh Anand I did check that, I am basically stuck on few steps

Identify target regions for realignment , Realign BAM to get better Indel calling , Call Indels ,
Call SNPs , View aligned reads in BAM/BAI

ADD REPLY • link 5.2 years ago by Learner ▴ 280

0

Entering edit mode

That is an 8 year old post where the newest answer is 4+ years old. You're better off implementing GATK Best Practices. AFAIK, building a pipeline from scratch can be quite challenging for people lacking significant experience.

ADD REPLY • link 5.2 years ago by Ram 43k

0

Entering edit mode

@RamRS so do you have any post or something that is new and I can follow ? there are many parameters which have an affect on the output so I would like to get the use of some people experiences rather than running around myself

ADD REPLY • link 5.2 years ago by Learner ▴ 280

0

Entering edit mode

Not really, no. Is using a cloud platform such as Seven Bridges or GATK Firecloud an option? That might be easier froma get-it-done perspective.

ADD REPLY • link 5.2 years ago by Ram 43k

0

Entering edit mode

@RamRS no I cannot use cloud, If I could galaxy would be a good option to use, the problem is that I don't want to just click and I have read a lot but surprisingly not many documents are out there on how one can progress

ADD REPLY • link 5.2 years ago by Learner ▴ 280

score 1 · Answer 1 · 2019-02-08

1

Entering edit mode

5.2 years ago

manuel.belmadani ★ 1.3k

I would follow the methods used by ExAC, which processed over 60k exomes using the pipeline described in their manuscript supplements. It's probably the most reliable reference I can think of in terms of exome variant calling.

Paper: https://www.nature.com/articles/nature19057

Go to Supplementary Information, starting from "1 Data Generation". They provide all the steps they use including filters. It should get you most of the way there.

ADD COMMENT • link 5.2 years ago by manuel.belmadani ★ 1.3k

0

Entering edit mode

@ this is also old, can I rely on their command ? for example they are using old GATK , look at the command java –jar GenomeAnalysisTK.jar \ but thanks for sharing, I m gonna read it carefully thanks. I like your answer already

ADD REPLY • link 5.2 years ago by Learner ▴ 280

0

Entering edit mode

It should be fine I think. It's not like ExAC data is not good anymore, it's still widely used and standard. The same group came out with gnomAD more recently which extends on the work from ExAC but it's not published yet. There might be a preprint on biorxiv but I'm not sure if the pipeline changed at all.

ADD REPLY • link 5.2 years ago by manuel.belmadani ★ 1.3k

0

Entering edit mode

Nice resource! It's great that's they're using GATK HC and not UG, but bwa mem is better than and this can replace bwa aln, right?

ADD REPLY • link 5.2 years ago by Ram 43k

0

Entering edit mode

Most likely yes. I remember reading some benchmark where they recommended aln for shorter reads (~36bp) and mem for anythong > 100bp, I recall mem being more straightforward to use for some reason.

ADD REPLY • link 5.2 years ago by manuel.belmadani ★ 1.3k

0

Entering edit mode

@manuel.belmadani I am trying to use their pipeline but GATK changed, now I cannot use any RealignerTargetCreator do you have any suggestion or steps that I should take ?

ADD REPLY • link 5.2 years ago by Learner ▴ 280

0

Entering edit mode

IndelRealigner is not really necessary with GATK-HC >3.4, I think. HC preforms local realignment around indels anyway, so you should be fine. Do hold on until others provide their feedback as well, my GATK knowledge is quite dated.

ADD REPLY • link 5.2 years ago by Ram 43k

0

Entering edit mode

That seems right. See this post.

Realigning reads using IndelRealigner or assembling reads using HaplotypeCaller allows us to call the insertion. That indel realignment has been a part of pre-processing workflows for seven years and will continue to be a part of workflows still dependent on locus-based callers is a testament to the improvements it brings. And if you feel apprehensive about omitting it from your HaplotypeCaller and MuTect2 workflows, we empathize. These changes are about improving efficiency in the face of incremental returns. If you find substantial changes, then I encourage you to share details with us.

ADD REPLY • link 5.2 years ago by manuel.belmadani ★ 1.3k