Random downsampling without replacement in bam files only in regions with coverage>x
1
0
Entering edit mode
6.9 years ago
VicGB • 0

So I have a bam file with a huge coverage in some regions and I want to subsample randomly only reads that cover that zones, without sampling reads that cover zones of low coverages.

Is there any tool or something? Thanks!

bam coverage reads • 2.6k views
ADD COMMENT
2
Entering edit mode
6.9 years ago
GenoMax 142k

What is the ultimate goal of this cherry-picking exercise? samtools view -s region would be easiest.

ADD COMMENT
0
Entering edit mode

So the objetive of my "cherry-picking (lol?)" I that I'm using a genome assembly pipeline that does 20 random subsamplings to down the coverage to 150x in every subsampling and then do the consensus to reconstruct the genome. The problem is that I have some samples where coverage is so huge in some regions but relatively low in others, so when I make the random subsampling it decreases coverage along all the regions included those with less than 150x, so that it would be suitable to only do the subsamplings of the reads covering high coverage regions but not in the other ones, because during consensus it creates gaps in the reconstructed genome.

ADD REPLY
0
Entering edit mode

I did not know that samtools could do this, now that I looked it up indeed there is such a feature. Thought the correct command would be

samtools view -s <fraction> region

Picard also has a similar function DownSample that you may use

https://broadinstitute.github.io/picard/command-line-overview.html#DownsampleSam

ADD REPLY

Login before adding your answer.

Traffic: 1383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6