Biostars beta testing.
Question: how to go from aligning forward when analyzing whole exome sequencing
0
Entering edit mode

I have read so many post on this website but most of them are old , I used this post to build up my pipeline, however, it is very old now (What Is The Best Pipeline For Human Whole Exome Sequencing? ).

Is there anyone who could give me a better update on Genome Analysis Toolkit I cannot find the best way to do the steps 6 to 11 of that post and also gatk changed as mentioned below

https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.3.0/

ADD COMMENTlink 12 months ago Learner • 160 • updated 12 months ago manuel.belmadani • 830
Entering edit mode
1

Did you check v4 best practices? What specifically is troubling you?

https://software.broadinstitute.org/gatk/best-practices/workflow

ADD REPLYlink 12 months ago
Santosh Anand
4.8k
Entering edit mode
0

Probably the "Exome" part, the Broad guides I think assume you have WGS.

ADD REPLYlink 12 months ago
manuel.belmadani
• 830
Entering edit mode
0

@manuel.belmadani WGS is different than WES. but the process should be rather similar

ADD REPLYlink 12 months ago
Learner
• 160
Entering edit mode
0

Yes that's why I was suggesting that using the Broad best practices might not be completely appropriate. I'd be concerned that the base/variant recalibration steps would differ for WES. It's been asked in their forum and there's some answer about specific steps but no comprehensive guide for WES afaik.

ADD REPLYlink 12 months ago
manuel.belmadani
• 830
Entering edit mode
0

Yeah, I find it's difficult to adjust the individual steps if you're not following an entire guide. See my answer about using the ExAC pipeline instead.

ADD REPLYlink 12 months ago
manuel.belmadani
• 830
Entering edit mode
0

@Santosh Anand I did check that, I am basically stuck on few steps

Identify target regions for realignment , Realign BAM to get better Indel calling , Call Indels ,
Call SNPs , View aligned reads in BAM/BAI
ADD REPLYlink 12 months ago
Learner
• 160
Entering edit mode
0

That is an 8 year old post where the newest answer is 4+ years old. You're better off implementing GATK Best Practices. AFAIK, building a pipeline from scratch can be quite challenging for people lacking significant experience.

ADD REPLYlink 12 months ago
RamRS
21k
Entering edit mode
0

@RamRS so do you have any post or something that is new and I can follow ? there are many parameters which have an affect on the output so I would like to get the use of some people experiences rather than running around myself

ADD REPLYlink 12 months ago
Learner
• 160
Entering edit mode
0

Not really, no. Is using a cloud platform such as Seven Bridges or GATK Firecloud an option? That might be easier froma get-it-done perspective.

ADD REPLYlink 12 months ago
RamRS
21k
Entering edit mode
0

@RamRS no I cannot use cloud, If I could galaxy would be a good option to use, the problem is that I don't want to just click and I have read a lot but surprisingly not many documents are out there on how one can progress

ADD REPLYlink 12 months ago
Learner
• 160
1
Entering edit mode

I would follow the methods used by ExAC, which processed over 60k exomes using the pipeline described in their manuscript supplements. It's probably the most reliable reference I can think of in terms of exome variant calling.

Paper: https://www.nature.com/articles/nature19057

Go to Supplementary Information, starting from "1 Data Generation". They provide all the steps they use including filters. It should get you most of the way there.

ADD COMMENTlink 12 months ago manuel.belmadani • 830
Entering edit mode
0

@ this is also old, can I rely on their command ? for example they are using old GATK , look at the command java –jar GenomeAnalysisTK.jar \ but thanks for sharing, I m gonna read it carefully thanks. I like your answer already

ADD REPLYlink 12 months ago
Learner
• 160
Entering edit mode
0

It should be fine I think. It's not like ExAC data is not good anymore, it's still widely used and standard. The same group came out with gnomAD more recently which extends on the work from ExAC but it's not published yet. There might be a preprint on biorxiv but I'm not sure if the pipeline changed at all.

ADD REPLYlink 12 months ago
manuel.belmadani
• 830
Entering edit mode
0

Nice resource! It's great that's they're using GATK HC and not UG, but bwa mem is better than and this can replace bwa aln, right?

ADD REPLYlink 12 months ago
RamRS
21k
Entering edit mode
0

Most likely yes. I remember reading some benchmark where they recommended aln for shorter reads (~36bp) and mem for anythong > 100bp, I recall mem being more straightforward to use for some reason.

ADD REPLYlink 12 months ago
manuel.belmadani
• 830
Entering edit mode
0

@manuel.belmadani I am trying to use their pipeline but GATK changed, now I cannot use any RealignerTargetCreator do you have any suggestion or steps that I should take ?

ADD REPLYlink 12 months ago
Learner
• 160
Entering edit mode
0

IndelRealigner is not really necessary with GATK-HC >3.4, I think. HC preforms local realignment around indels anyway, so you should be fine. Do hold on until others provide their feedback as well, my GATK knowledge is quite dated.

ADD REPLYlink 12 months ago
RamRS
21k
Entering edit mode
0

That seems right. See this post.

Realigning reads using IndelRealigner or assembling reads using HaplotypeCaller allows us to call the insertion. That indel realignment has been a part of pre-processing workflows for seven years and will continue to be a part of workflows still dependent on locus-based callers is a testament to the improvements it brings. And if you feel apprehensive about omitting it from your HaplotypeCaller and MuTect2 workflows, we empathize. These changes are about improving efficiency in the face of incremental returns. If you find substantial changes, then I encourage you to share details with us.

ADD REPLYlink 12 months ago
manuel.belmadani
• 830

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0