I have a paired end DNA sequence alignment BAM file (got from using BWA -M) and would like to know if the points mentioned below is a good practice..
1) Since my aim is to perform variant calling, is it OK to keep unmapped, secondary,singleton and duplicated marked reads in the BAM file itself because by default they won't be considered for variant calling (bcftools mpileup and call)..OR..remove them to make the BAM file smaller to save space.
2) Since I am only interested in certain genes in the genome, I am planning to chop my BAM file to get only the reads aligned to my gene of interest (and 10000 bp upstream and downstream) and then perform variant calls on them. I can then perform variant calling on all my candidate genes of interest in parallel.
Can anyone let me know any caveat in the above mentioned approaches?