I am trying to solve two problems that I recurrently have when working with BAM Files. I am mostly familiar with Python but other language solutions are welcome, specially if they are fast.
(1) reads _over_ window. I'd like to have a fast script that takes as an input a bam file and it outputs the number of reads for a given window across all the genome. I.e., the output should be a txt file for each chromosome containing the number of reads in each W kb window. Also when the data is chip-seq data I'd like to extend the regions and then have the counts.
I am looking for something of this kind: ./reads _over_ window.py chrom _sizes.bed reads.bam -w 1000 -e 164 --dir reads_ dir and it should output a chr1.txt,...,chrX.txt files with the number of reads for each chromosome for a 1000 bp window, reads extended 164 bp.
(2) reads _over_ bed. I'd like to compute the number of reads for each region defined in the bed file (total number, not average). Also, I'd like to be able to extend the reads by e number of base pairs.
I am looking for something of this kind: ./reads _over_ bed regions.bed reads.bam -e 164 --output regions.reads where the output is the total number of reads that map to a given region.
Please advise. Thank you very much!