Question

Number Of Reads Over A Given Window And Over Given Regions Bam Files

1

Entering edit mode

11.8 years ago

dfernan ▴ 760

Hi,

I am trying to solve two problems that I recurrently have when working with BAM Files. I am mostly familiar with Python but other language solutions are welcome, specially if they are fast.

(1) readsoverwindow. I'd like to have a fast script that takes as an input a bam file and it outputs the number of reads for a given window across all the genome. I.e., the output should be a txt file for each chromosome containing the number of reads in each W kb window. Also when the data is chip-seq data I'd like to extend the regions and then have the counts.

I am looking for something of this kind: ./readsoverwindow.py chromsizes.bed reads.bam -w 1000 -e 164 --dir readsdir and it should output a chr1.txt,...,chrX.txt files with the number of reads for each chromosome for a 1000 bp window, reads extended 164 bp.

(2) readsoverbed. I'd like to compute the number of reads for each region defined in the bed file (total number, not average). Also, I'd like to be able to extend the reads by e number of base pairs.

I am looking for something of this kind: ./readsoverbed regions.bed reads.bam -e 164 --output regions.reads where the output is the total number of reads that map to a given region.

Please advise. Thank you very much!

bam python read • 6.3k views

ADD COMMENT • link updated 11.8 years ago by brentp 24k • written 11.8 years ago by dfernan ▴ 760

score 9 · Answer 1 · 2012-07-16

9

Entering edit mode

11.8 years ago

brentp 24k

Have you looked at bedtools? For (1), something like:

bedtools makewindows -g chromsizes.bed -w 10000 \
    | bedtools coverage -b - -abam input.bam

if you want to extend the reads, just do that with awk like this:

samtools view -F 4 input.bam  | awk 'BEGIN{FS=OFS="\t"}{ print $3, $4-168,$4 + 168 + length($10) }' > input.padded.bed

Then use that instead of input.bam in the first command above.

For 2, I think that bedtools coverage does exactly what you want.

ADD COMMENT • link 11.8 years ago by brentp 24k

0

Entering edit mode

Hi thanks a lot! seems correct, just one more Q, how would I use awk to extend the reads from a bam file?

ADD REPLY • link 11.8 years ago by dfernan ▴ 760

0

Entering edit mode

I added a note in the answer above about using awk.

ADD REPLY • link 11.8 years ago by brentp 24k

0

Entering edit mode

wow, awesome! thx a lot brentp.

ADD REPLY • link 11.8 years ago by dfernan ▴ 760

0

Entering edit mode

I know this is after 2.3 years but just want to document this approach to extend reads. It uses bedtools

bamToBed -i input.bam | slopBed -i - -g genome_file_of_chr_sizes -b 168 | bedToBam -i - -g genome_file_of_chr_sizes > output_extended.bam

Please let me know if there are overhead attached with this approach.

ADD REPLY • link 9.5 years ago by Gjain 5.8k