Relationship of bedGraph to wig.
1
0
Entering edit mode
7.6 years ago
ariel.balter ▴ 260

Bioinformatics neophyte here. I'm trying to view tracks from my ChIPSeq data. I followed a fairly standard pipeline process fastq-->align (bwa)-->bam-->sort (picard)-->call peaks (macs2)-->.broadPeak, .bdg & pileups (samtools mpileup). I also made filtered pileups for just under the peaks. My pileups have the format

<chromosome> <position> <counts>

I want to view the tracks around the peaks. I considered creating wig files from them by creating a new header each time I reach a new chromosome

variableStep chrom=<chromosome>
<position>        <counts>
<position+1>      <counts>
<position+2>      <counts>
...
...

I would have used wigToBigWig to make bigWig files. But then I realized that I can just use MACS2 to spit out bedGraph files .bdg. I did this, and they look like:

<chrom>            <start>    <stop>     <value> 
KL568395.1         0          8763       0.38251
KL568395.1         8763       8833       0.55459
KL568395.1         8833       9041       0.38251
KL568395.1         9041       9111       0.55459
KL568395.1         9111       9172       0.38251
KL568395.1         9172       9198       0.55459
KL568395.1         9198       9242       1.10918
...
...

I don't get how these are tracks of counts. Can someone please explain?

ChIP-Seq bedGraph wig wiggle • 2.9k views
ADD COMMENT
0
Entering edit mode
7.6 years ago

The values in a wiggle (".wig" extension, typically) file don't need to be integer counts, they can be anything. In fact, once you convert to bigWig format everything is a float (e.g., 1.0, 0.67, 22.5) anyway, since it doesn't store integers.

As an aside, you might find bamCoverage from deepTools useful. It'll directly make a bigWig file from a BAM file for you.

ADD COMMENT
0
Entering edit mode

I'm ok with the float thing, as long as I know the scaling factor. But I don't get the part about not having a value at every location. Also, I looked at bamCoverage. Would I use the --binSize=1 setting to get a value at each position?

ADD REPLY
0
Entering edit mode

Whether you get a value at each position depends on whether you store areas with values of 0 or not. The concept of computing coverage of bins (i.e., fixed-width intervals) of positions is due to that fact that storing the actual value at every position is often overkill. In the common case of looking at histone modifications, it really doesn't matter if you just chunk everything into 50 base or more blocks, peaks are vague noisy things anyway. The only time you actually need to use a bin size of 1 is when you're looking at something that actually has single-base precision (e.g., in analysing RiboSeq datasets, I use this to get exact positions of ribosomal pausing). This can also be useful if you're looking at transcription factor binding sites (or really anything with a focal source providing all of the signal).

ADD REPLY
0
Entering edit mode

Thanks! We are looking for transcription factor binding sites. So I'll give a try with bin size 1.

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6