How to bin my count data for the entire genome?
1
0
Entering edit mode
4.9 years ago
fr ▴ 210

I'm trying to represent my ChIP-Seq counts, normalized or not, in specific genomic bins but don't know how to do so.

I have already processed my data, and have used findPeaks followed by pos2bed.pl to produce .bedGraph files that contain this info. However, I'd like to have counts summarized for each 10kb bin throughout the genome (this is OK for my purposes). My .bedGraphs contain some of this information, but not spread in equally defined 10kb bins.

I was looking at Homer's annotatePeaks.pl -hist <bin size>, which seems to have data summarize in specific bins, but these are around a peak which is not really what I want. However, I am particularly interested in having them represented in specific genomic bins throughout the genome (i.e. not only those that would be found in a distance d around a peak). I'm sure there is a tool to summarize this, but I'm just not aware of which one to use.

Could someone advice on how I could bin my data?

ChIP-Seq next-gen sequencing genome homer • 2.9k views
ADD COMMENT
5
Entering edit mode
4.9 years ago
Prakash ★ 2.2k

bedtools makewindows you might be looking for. you can divide your genome into bins of 10kb and then calculate coverage using your bam files. or may be you can use tag directory from homer as well.

ADD COMMENT
0
Entering edit mode

@Prakash, thanks a lot for your suggestion. Just to make sure, you mean something like this:

bedtools makewindows -g mm10.txt -w 50000 > binned_genome.bed

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

My question is then: how are the summaries done? For instance, is each bin showing the mean of counts in that region? I couldn't find this information.

Thanks a lot

Edit: found this thread with some useful information

ADD REPLY
2
Entering edit mode

bedtools coverage -a binned_genome.bed -b myfile.bed -sorted -g mm10.txt

This will give mean coverage across your binned genomic regions. you can also use genomeCoverage bed to get reads normalized per million.

genomeCoverageBed -ibam <your aligned bam file> -i <binned_genome.bed> -g mm10.fa -scale RPM
ADD REPLY

Login before adding your answer.

Traffic: 1469 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6