This is a beta test.
Question: some good practices doubts
1
Entering edit mode

Hi,

I have a paired end DNA sequence alignment BAM file (got from using BWA -M) and would like to know if the points mentioned below is a good practice..

1) Since my aim is to perform variant calling, is it OK to keep unmapped, secondary,singleton and duplicated marked reads in the BAM file itself because by default they won't be considered for variant calling (bcftools mpileup and call)..OR..remove them to make the BAM file smaller to save space.

2) Since I am only interested in certain genes in the genome, I am planning to chop my BAM file to get only the reads aligned to my gene of interest (and 10000 bp upstream and downstream) and then perform variant calls on them. I can then perform variant calling on all my candidate genes of interest in parallel.

Can anyone let me know any caveat in the above mentioned approaches?

ADD COMMENTlink 2.1 years ago prasundutta87 • 330 • updated 11 months ago Biostar 20
3
Entering edit mode

1) I don't think the decrease in size would be a substantial space saver. I'm against all manual tampering with bam files. A good variant caller should ignore all non-proper reads.

2) You can do (at least in GATK) variant calling for specific intervals using the -L option. If you would start chopping bams you'll get in serious trouble with coordinates.

ADD COMMENTlink 2.1 years ago WouterDeCoster 39k
Entering edit mode
1

Excellent point made by @WouterDeCoster. I'll add one: GATK calls variants by doing local re-assembly of reads surrounding the variant sites, and sometimes it would create problems around the boundaries of variant regions used for this assembly, especially if indels are involved. I've encountered such problems and the advice I got from GATK support team is to try calling variants for the entire chromosomes as much as possible if computing resource is not a problem.

ADD REPLYlink 11 months ago
Vitis
♦ 2.1k
Entering edit mode
0

Ok..thanks for letting me know this. The coordinate screw ups do make sense..and bcftools mpileup also has the option to provide specific coordinates to perform selective variant calling..

ADD REPLYlink 2.1 years ago
prasundutta87
• 330

Login before adding your answer.

Powered by the version 1.6