Question

Relationship of bedGraph to wig.

0

Entering edit mode

7.6 years ago

ariel.balter ▴ 260

Bioinformatics neophyte here. I'm trying to view tracks from my ChIPSeq data. I followed a fairly standard pipeline process fastq-->align (bwa)-->bam-->sort (picard)-->call peaks (macs2)-->.broadPeak, .bdg & pileups (samtools mpileup). I also made filtered pileups for just under the peaks. My pileups have the format

<chromosome> <position> <counts>

I want to view the tracks around the peaks. I considered creating wig files from them by creating a new header each time I reach a new chromosome

variableStep chrom=<chromosome>
<position>        <counts>
<position+1>      <counts>
<position+2>      <counts>
...
...

I would have used wigToBigWig to make bigWig files. But then I realized that I can just use MACS2 to spit out bedGraph files .bdg. I did this, and they look like:

<chrom>            <start>    <stop>     <value> 
KL568395.1         0          8763       0.38251
KL568395.1         8763       8833       0.55459
KL568395.1         8833       9041       0.38251
KL568395.1         9041       9111       0.55459
KL568395.1         9111       9172       0.38251
KL568395.1         9172       9198       0.55459
KL568395.1         9198       9242       1.10918
...
...

I don't get how these are tracks of counts. Can someone please explain?

ChIP-Seq bedGraph wig wiggle • 2.9k views

ADD COMMENT • link updated 7.6 years ago by Devon Ryan 104k • written 7.6 years ago by ariel.balter ▴ 260

score 0 · Answer 1 · 2016-09-10

0

Entering edit mode

7.6 years ago

Devon Ryan 104k

The values in a wiggle (".wig" extension, typically) file don't need to be integer counts, they can be anything. In fact, once you convert to bigWig format everything is a float (e.g., 1.0, 0.67, 22.5) anyway, since it doesn't store integers.

As an aside, you might find bamCoverage from deepTools useful. It'll directly make a bigWig file from a BAM file for you.

ADD COMMENT • link 7.6 years ago by Devon Ryan 104k

0

Entering edit mode

I'm ok with the float thing, as long as I know the scaling factor. But I don't get the part about not having a value at every location. Also, I looked at bamCoverage. Would I use the --binSize=1 setting to get a value at each position?

ADD REPLY • link 7.6 years ago by ariel.balter ▴ 260

0

Entering edit mode

Whether you get a value at each position depends on whether you store areas with values of 0 or not. The concept of computing coverage of bins (i.e., fixed-width intervals) of positions is due to that fact that storing the actual value at every position is often overkill. In the common case of looking at histone modifications, it really doesn't matter if you just chunk everything into 50 base or more blocks, peaks are vague noisy things anyway. The only time you actually need to use a bin size of 1 is when you're looking at something that actually has single-base precision (e.g., in analysing RiboSeq datasets, I use this to get exact positions of ribosomal pausing). This can also be useful if you're looking at transcription factor binding sites (or really anything with a focal source providing all of the signal).

ADD REPLY • link 7.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks! We are looking for transcription factor binding sites. So I'll give a try with bin size 1.

ADD REPLY • link 7.6 years ago by ariel.balter ▴ 260