Bioinformatics neophyte here. I'm trying to view tracks from my ChIPSeq data. I followed a fairly standard pipeline process fastq-->align (bwa)-->bam-->sort (picard)-->call peaks (macs2)-->.broadPeak, .bdg & pileups (samtools mpileup)
. I also made filtered pileups for just under the peaks. My pileups have the format
<chromosome> <position> <counts>
I want to view the tracks around the peaks. I considered creating wig files from them by creating a new header each time I reach a new chromosome
variableStep chrom=<chromosome>
<position> <counts>
<position+1> <counts>
<position+2> <counts>
...
...
I would have used wigToBigWig
to make bigWig files. But then I realized that I can just use MACS2 to spit out bedGraph files .bdg
. I did this, and they look like:
<chrom> <start> <stop> <value>
KL568395.1 0 8763 0.38251
KL568395.1 8763 8833 0.55459
KL568395.1 8833 9041 0.38251
KL568395.1 9041 9111 0.55459
KL568395.1 9111 9172 0.38251
KL568395.1 9172 9198 0.55459
KL568395.1 9198 9242 1.10918
...
...
I don't get how these are tracks of counts. Can someone please explain?
I'm ok with the float thing, as long as I know the scaling factor. But I don't get the part about not having a value at every location. Also, I looked at
bamCoverage
. Would I use the--binSize=1
setting to get a value at each position?Whether you get a value at each position depends on whether you store areas with values of 0 or not. The concept of computing coverage of bins (i.e., fixed-width intervals) of positions is due to that fact that storing the actual value at every position is often overkill. In the common case of looking at histone modifications, it really doesn't matter if you just chunk everything into 50 base or more blocks, peaks are vague noisy things anyway. The only time you actually need to use a bin size of 1 is when you're looking at something that actually has single-base precision (e.g., in analysing RiboSeq datasets, I use this to get exact positions of ribosomal pausing). This can also be useful if you're looking at transcription factor binding sites (or really anything with a focal source providing all of the signal).
Thanks! We are looking for transcription factor binding sites. So I'll give a try with bin size 1.