Please suggest your idea for variant calling analysis of many BAM files from whole genome sequencing
1
0
Entering edit mode
6.0 years ago
seta ★ 1.9k

Hi all,

I have the information of whole-genome sequencing (BAM files) of about 1100 individuals (human) that should be processed to find all variants of specific genes. I read and browsed a lot, however as it’s my first experience in this filed, I haven’t obtained a final conclusion. Could you please kindly share me your experiences about the following issues?

1) Is it better to do variant calling analysis on the whole BAM file or extract the region of interest from BMA file by a tool such as VariantBam enter link description here. However, I found just one citation for this tool, if you suggest extracting the region of interest from BMA file, please let me know any alternative tool for this purpose?

2) For variant calling analysis, please kindly tell me how to process 1100 BAM files? Processing each BAM file for variant calling, then merge resulting vcf files or merging all BAM files and doing variant calling analysis on this merged BAM file, which one do you suggest?

3) Finally, how I can match the discovered variants with all previously identified variants for genes of my interest? Please share me your suggested tool.

Many thanks in advance for any help

variant calling alingnment whole genome sequencing • 1.2k views
ADD COMMENT
1
Entering edit mode

Are you looking for germline or somatic variants ?

If you end up using GATK, regarding your focus on specific genes, you can use an interval_list to restrict the search (and save time).

ADD REPLY
0
Entering edit mode

Thanks, looking for germline variant.

ADD REPLY
1
Entering edit mode
6.0 years ago

Hello seta,

Is it better to do variant calling analysis on the whole BAM file or extract the region of interest from BMA file by a tool such as VariantBam enter link description here.

there is no need te extract the region first. Variant Caller like freebayes or GATK HaplotypeCaller have the ability for providing a list of regions where they perform the variant calling on.

For variant calling analysis, please kindly tell me how to process 1100 BAM files? Processing each BAM file for variant calling, then merge resulting vcf files or merging all BAM files and doing variant calling analysis on this merged BAM file, which one do you suggest?

Merging the bam files is not needed. Variant Caller like freebayes or GATK HaplotypeCaller accept also a list of bam files. You have to decide whether Variant Calling is done indepently for each bam file or not.

Finally, how I can match the discovered variants with all previously identified variants for genes of my interest?

What do you mean by "previously identified"? Known in a database like dbSNP? Then Variant Annotation is the term you are looking for.

fin swimmer

ADD COMMENT
0
Entering edit mode

Many thanks for your nice explanations. However, working with cohort can be more difficult than the single genome. Could you please kindly let me know if there is any workflow/pipeline for handling and parsing of a cohort WGS analysis?

ADD REPLY

Login before adding your answer.

Traffic: 1840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6