Biostar Beta. Not for public use.
Number Of Reads Over A Given Window And Over Given Regions Bam Files
1
Entering edit mode
6.4 years ago
dfernan • 650
United States

Hi,

I am trying to solve two problems that I recurrently have when working with BAM Files. I am mostly familiar with Python but other language solutions are welcome, specially if they are fast.

(1) reads _over_ window. I'd like to have a fast script that takes as an input a bam file and it outputs the number of reads for a given window across all the genome. I.e., the output should be a txt file for each chromosome containing the number of reads in each W kb window. Also when the data is chip-seq data I'd like to extend the regions and then have the counts.

I am looking for something of this kind: ./reads _over_ window.py chrom _sizes.bed reads.bam -w 1000 -e 164 --dir reads_ dir and it should output a chr1.txt,...,chrX.txt files with the number of reads for each chromosome for a 1000 bp window, reads extended 164 bp.

(2) reads _over_ bed. I'd like to compute the number of reads for each region defined in the bed file (total number, not average). Also, I'd like to be able to extend the reads by e number of base pairs.

I am looking for something of this kind: ./reads _over_ bed regions.bed reads.bam -e 164 --output regions.reads where the output is the total number of reads that map to a given region.

Please advise. Thank you very much!

bam python read • 4.0k views
ADD COMMENTlink
7
Entering edit mode
18 months ago
brentp 23k
Salt Lake City, UT

Have you looked at bedtools? For (1), something like:

bedtools makewindows -g chromsizes.bed -w 10000 \
    | bedtools coverage -b - -abam input.bam

if you want to extend the reads, just do that with awk like this:

samtools view -F 4 input.bam  | awk 'BEGIN{FS=OFS="\t"}{ print $3, $4-168,$4 + 168 + length($10) }' > input.padded.bed

Then use that instead of input.bam in the first command above.

For 2, I think that bedtools coverage does exactly what you want.

ADD COMMENTlink
0
Entering edit mode

Hi thanks a lot! seems correct, just one more Q, how would I use awk to extend the reads from a bam file?

ADD REPLYlink
0
Entering edit mode

I added a note in the answer above about using awk.

ADD REPLYlink
0
Entering edit mode

wow, awesome! thx a lot brentp.

ADD REPLYlink
0
Entering edit mode

I know this is after 2.3 years but just want to document this approach to extend reads. It uses bedtools

bamToBed -i input.bam | slopBed -i - -g genome_file_of_chr_sizes -b 168 | bedToBam -i - -g genome_file_of_chr_sizes > output_extended.bam

Please let me know if there are overhead attached with this approach.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1