I am attempting to run a comparison a batch of alignments (bam format) to assess spread. However, some of my datasets are larger than others and I wish to subsample down to an equal number of reads per sample.
If I wanted to randomly extract 1 Million reads from a bam file is there a method to do this?
Note: I am fully aware of the samtools and picard methods which allow you to reduce by a proportion (i.e. the flag -s 0.33) but that would not result in an fixed number of reads per sample, which is what I need, but a reduced proportion _per sample_ which doesn't help.
Bam subsampling has previously been talked about it but always the proportional data reduction, not the fixed number required: https://www.biostars.org/p/44527/, https://www.biostars.org/p/76791/#76796, https://www.biostars.org/p/110041, https://broadinstitute.github.io/picard/command-line-overview.html
Edit: Also, I've come across bamtools but haven't been able to get it to work and seems to have not been updated in quite some time