some good practices doubts
1
1
Entering edit mode
6.4 years ago
prasundutta87 ▴ 660

Hi,

I have a paired end DNA sequence alignment BAM file (got from using BWA -M) and would like to know if the points mentioned below is a good practice..

1) Since my aim is to perform variant calling, is it OK to keep unmapped, secondary,singleton and duplicated marked reads in the BAM file itself because by default they won't be considered for variant calling (bcftools mpileup and call)..OR..remove them to make the BAM file smaller to save space.

2) Since I am only interested in certain genes in the genome, I am planning to chop my BAM file to get only the reads aligned to my gene of interest (and 10000 bp upstream and downstream) and then perform variant calls on them. I can then perform variant calling on all my candidate genes of interest in parallel.

Can anyone let me know any caveat in the above mentioned approaches?

alignment gene variant calling • 1.1k views
ADD COMMENT
3
Entering edit mode
6.4 years ago

1) I don't think the decrease in size would be a substantial space saver. I'm against all manual tampering with bam files. A good variant caller should ignore all non-proper reads.

2) You can do (at least in GATK) variant calling for specific intervals using the -L option. If you would start chopping bams you'll get in serious trouble with coordinates.

ADD COMMENT
1
Entering edit mode

Excellent point made by @WouterDeCoster. I'll add one: GATK calls variants by doing local re-assembly of reads surrounding the variant sites, and sometimes it would create problems around the boundaries of variant regions used for this assembly, especially if indels are involved. I've encountered such problems and the advice I got from GATK support team is to try calling variants for the entire chromosomes as much as possible if computing resource is not a problem.

ADD REPLY
0
Entering edit mode

Ok..thanks for letting me know this. The coordinate screw ups do make sense..and bcftools mpileup also has the option to provide specific coordinates to perform selective variant calling..

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6