This is a beta test.
Question: some good practices doubts
Entering edit mode


I have a paired end DNA sequence alignment BAM file (got from using BWA -M) and would like to know if the points mentioned below is a good practice..

1) Since my aim is to perform variant calling, is it OK to keep unmapped, secondary,singleton and duplicated marked reads in the BAM file itself because by default they won't be considered for variant calling (bcftools mpileup and call)..OR..remove them to make the BAM file smaller to save space.

2) Since I am only interested in certain genes in the genome, I am planning to chop my BAM file to get only the reads aligned to my gene of interest (and 10000 bp upstream and downstream) and then perform variant calls on them. I can then perform variant calling on all my candidate genes of interest in parallel.

Can anyone let me know any caveat in the above mentioned approaches?

ADD COMMENTlink 2.1 years ago prasundutta87 • 330 • updated 11 months ago Biostar 20
Entering edit mode

1) I don't think the decrease in size would be a substantial space saver. I'm against all manual tampering with bam files. A good variant caller should ignore all non-proper reads.

2) You can do (at least in GATK) variant calling for specific intervals using the -L option. If you would start chopping bams you'll get in serious trouble with coordinates.

ADD COMMENTlink 2.1 years ago WouterDeCoster 39k
Entering edit mode

Excellent point made by @WouterDeCoster. I'll add one: GATK calls variants by doing local re-assembly of reads surrounding the variant sites, and sometimes it would create problems around the boundaries of variant regions used for this assembly, especially if indels are involved. I've encountered such problems and the advice I got from GATK support team is to try calling variants for the entire chromosomes as much as possible if computing resource is not a problem.

ADD REPLYlink 11 months ago
♦ 2.1k
Entering edit mode

Ok..thanks for letting me know this. The coordinate screw ups do make sense..and bcftools mpileup also has the option to provide specific coordinates to perform selective variant calling..

ADD REPLYlink 2.1 years ago
• 330

Login before adding your answer.

Powered by the version 1.6