Biostar Beta. Not for public use.
Question: Relationship of bedGraph to wig.
0
Entering edit mode

Bioinformatics neophyte here. I'm trying to view tracks from my ChIPSeq data. I followed a fairly standard pipeline process fastq-->align (bwa)-->bam-->sort (picard)-->call peaks (macs2)-->.broadPeak, .bdg & pileups (samtools mpileup). I also made filtered pileups for just under the peaks. My pileups have the format

<chromosome> <position> <counts>

I want to view the tracks around the peaks. I considered creating wig files from them by creating a new header each time I reach a new chromosome

variableStep chrom=<chromosome>
<position>        <counts>
<position+1>      <counts>
<position+2>      <counts>
...
...

I would have used wigToBigWig to make bigWig files. But then I realized that I can just use MACS2 to spit out bedGraph files .bdg. I did this, and they look like:

<chrom>            <start>    <stop>     <value> 
KL568395.1         0          8763       0.38251
KL568395.1         8763       8833       0.55459
KL568395.1         8833       9041       0.38251
KL568395.1         9041       9111       0.55459
KL568395.1         9111       9172       0.38251
KL568395.1         9172       9198       0.55459
KL568395.1         9198       9242       1.10918
...
...

I don't get how these are tracks of counts. Can someone please explain?

ADD COMMENTlink 3.4 years ago ariel.balter • 140 • updated 3.4 years ago Devon Ryan 90k
0
Entering edit mode

The values in a wiggle (".wig" extension, typically) file don't need to be integer counts, they can be anything. In fact, once you convert to bigWig format everything is a float (e.g., 1.0, 0.67, 22.5) anyway, since it doesn't store integers.

As an aside, you might find bamCoverage from deepTools useful. It'll directly make a bigWig file from a BAM file for you.

ADD COMMENTlink 3.4 years ago Devon Ryan 90k
Entering edit mode
0

I'm ok with the float thing, as long as I know the scaling factor. But I don't get the part about not having a value at every location. Also, I looked at bamCoverage. Would I use the --binSize=1 setting to get a value at each position?

ADD REPLYlink 3.4 years ago
ariel.balter
• 140
Entering edit mode
0

Whether you get a value at each position depends on whether you store areas with values of 0 or not. The concept of computing coverage of bins (i.e., fixed-width intervals) of positions is due to that fact that storing the actual value at every position is often overkill. In the common case of looking at histone modifications, it really doesn't matter if you just chunk everything into 50 base or more blocks, peaks are vague noisy things anyway. The only time you actually need to use a bin size of 1 is when you're looking at something that actually has single-base precision (e.g., in analysing RiboSeq datasets, I use this to get exact positions of ribosomal pausing). This can also be useful if you're looking at transcription factor binding sites (or really anything with a focal source providing all of the signal).

ADD REPLYlink 3.4 years ago
Devon Ryan
90k
Entering edit mode
0

Thanks! We are looking for transcription factor binding sites. So I'll give a try with bin size 1.

ADD REPLYlink 3.4 years ago
ariel.balter
• 140

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0