Biostar Beta. Not for public use.
Question: Using one bam file with huge coverage, How to generate several bam files with different coverage
Entering edit mode


I have a bam file includes a single gene sequence with huge coverage (> 5000X) from which I want to call snps. I have several snps previously identified in my sample which I am going to use as control to test my pipeline. From my original bam file , I want to generate some bam files with different coverage. Then I will call SNPs from each one of them, compare them with the previously known genotypes to see which coverage gives me the highest concordance, and what is the coverage threshold below which snp calling is not reliable.


ADD COMMENTlink 18 months ago tarek.mohamed • 250 • updated 18 months ago WouterDeCoster 39k
Entering edit mode

Use downsampling in samtools:

samtools view -s 0.1 -o new_bam_at_0.1_fold_coverage_of_the_original.bam yourbam.bam
ADD COMMENTlink 18 months ago WouterDeCoster 39k
Entering edit mode

Thanks for the reply!

This command will exclude 90% of reads at a position. Which reads will be excluded, is it a random process? Do I have any power to select on what bases these reads will be excluded? as for example keep the reads with highest mapping quality or base quality scores?

Since the coverage is not uniform across my target region, can I downsample to a certain number of reads at a position rather than using percentage?

ADD REPLYlink 18 months ago
• 250

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0