randomly subsampling a bam file three times
1
0
Entering edit mode
7.8 years ago
GK1610 ▴ 110

I have a sample.bam file and I want to randomly sample 15 million reads from this file

I want to select 15 million reads 3 times

random_sample_15_million_1.bam, random_sample_15_million_2.bam random_sample_15_million_3.bam

I DONT want these 3 files to be identical.

ChIP-Seq • 5.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
7.8 years ago

Count the total reads, and find what proportion of the total is 15M. Then Use Picard DownsampleSAM to select just that %. https://broadinstitute.github.io/picard/command-line-overview.html

Set the random seed to 1,2,3 and you'll have unique files.

ADD COMMENT
0
Entering edit mode

If Picard gives you trouble, samtools view also has a downsample parameter.

ADD REPLY
0
Entering edit mode

in case it's not apparent ,

PROBABILITY=Double P=Double The probability of keeping any individual read, between 0 and 1. Default value: 1.0. This option can be set to 'null' to clear the default value.

use 0.25 for ~ 25% of the reads.

ADD REPLY

Login before adding your answer.

Traffic: 2122 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6