Biostar Beta. Not for public use.
Overlapping regions in a BedGraph file
0
Entering edit mode
4.5 years ago
Sam • 70
United States

I downloaded a GRO-Seq dataset from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38140 and got a bedGraph file that's 4.2gb large.

I wanted to generate heatmaps of sense and antisense strands using deeptools, which requires a bigWig file format. I figured it'd be pretty simple using a bdg2bw script found here https://gist.github.com/taoliu/2469050 but I recieved this error:

$ /Users/Carlos/Dropbox/bdg2bw Gro-Seq.bdg /Users/Carlos/Desktop/Projects/GenomicDataPublication_Project/Gencode/hg19.chrom.sizes 

Overlapping regions in bedGraph line 44 of Gro-Seq.bdg.clip

Not quite sure what to do to fix this. I know it's pretty simple in concept, just remove overlapping regions. But i'm not sure how exactly to go about doing that or if it would bias the results or something just as terrible, of the heatmap if I did.

Any ideas?

ADD COMMENTlink
0
Entering edit mode
15 months ago
Freiburg, Germany

I haven't merged it into the master branch yet, but you can make bigWig files with pyBigWig in the WriterIntegration branch. That'd allow you to script something to fix the problematic line(s).

The fix would be something along the lines of:

import pyBigWig
lastLine = None
bw = pyBigWig.open("something.bw", "w")
bw.addHeader(some stuff)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            #Do something appropriate
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

That'd give you a bigWig file and allow you to fix overlapping entries however you'd like.

ADD COMMENTlink
0
Entering edit mode

This looks promising. I'm not very (or at all) versed at scripting, but i'll give it a whirl and see if I can figure it out.

ADD REPLYlink
0
Entering edit mode

Go ahead and post again with what you have if you get stuck.

ADD REPLYlink
0
Entering edit mode

Not surprisingly I got stuck quite quickly. Here's the edited script: (if you can call it that)

import pyBigWig
lastLine = None
bw = pyBigWig.open("Gro-Seq.bdg", "w")
bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            fout.write(line)
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

The error i'm receiving:

$ python bdgintobw.py

Traceback (most recent call last):

  File "bdgintobw.py", line 4, in <module>

    bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)

AttributeError: 'NoneType' object has no attribute 'addHeader'
ADD REPLYlink
0
Entering edit mode

For whatever reason you couldn't open "Gro-Seq.bdg" for opening (are you using the "WriterIntegration" branch?). Also, I apparently introduced a bug recently where "maxZooms" doesn't work! I'll try to get that fixed today.

ADD REPLYlink
0
Entering edit mode

maxZooms should now work as either a keyword or optional argument, though obviously if open() returns None then you're having a different issue.

ADD REPLYlink
0
Entering edit mode

Hi, Is there any update about this post? Thanks. T.

ADD REPLYlink
0
Entering edit mode
15 months ago
WashingtonDC

This solution only works for non-Bookended bedgraph file
Sort and then merge overlapping regions(mean of score or max, sum, etc)

LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s in.bdg > out.bdg.tmp
bedtools merge -i out.bdg.tmp -c 4 -d 0 -o max > out.bdg
LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s out.bdg > out.bdg.sorted
bedGraphToBigWig out.bdg.sorted hg38.genome out.bigwig
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1