Question: Overlapping regions in a BedGraph file
0
Entering edit mode

I downloaded a GRO-Seq dataset from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38140 and got a bedGraph file that's 4.2gb large.

I wanted to generate heatmaps of sense and antisense strands using deeptools, which requires a bigWig file format. I figured it'd be pretty simple using a bdg2bw script found here https://gist.github.com/taoliu/2469050 but I recieved this error:

$ /Users/Carlos/Dropbox/bdg2bw Gro-Seq.bdg /Users/Carlos/Desktop/Projects/GenomicDataPublication_Project/Gencode/hg19.chrom.sizes 

Overlapping regions in bedGraph line 44 of Gro-Seq.bdg.clip

Not quite sure what to do to fix this. I know it's pretty simple in concept, just remove overlapping regions. But i'm not sure how exactly to go about doing that or if it would bias the results or something just as terrible, of the heatmap if I did.

Any ideas?

ADD COMMENTlink 4.1 years ago Sam • 70 • updated 7 months ago Simply Bioinformatics • 150
0
Entering edit mode

I haven't merged it into the master branch yet, but you can make bigWig files with pyBigWig in the WriterIntegration branch. That'd allow you to script something to fix the problematic line(s).

The fix would be something along the lines of:

import pyBigWig
lastLine = None
bw = pyBigWig.open("something.bw", "w")
bw.addHeader(some stuff)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            #Do something appropriate
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

That'd give you a bigWig file and allow you to fix overlapping entries however you'd like.

ADD COMMENTlink 4.1 years ago Devon Ryan 90k
Entering edit mode
0

This looks promising. I'm not very (or at all) versed at scripting, but i'll give it a whirl and see if I can figure it out.

ADD REPLYlink 4.1 years ago
Sam
• 70
Entering edit mode
0

Go ahead and post again with what you have if you get stuck.

ADD REPLYlink 4.1 years ago
Devon Ryan
90k
Entering edit mode
0

Not surprisingly I got stuck quite quickly. Here's the edited script: (if you can call it that)

import pyBigWig
lastLine = None
bw = pyBigWig.open("Gro-Seq.bdg", "w")
bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)
for line in bedGraph.split():
    if lastLine:
        if line[0] == lastLine[0] and line[1] < lastLine[2]:
            fout.write(line)
    lastLine = line
    bw.addEntries([line[0]], [line[1]], ends=[line[2]], values=[line[3]])
bw.close()

The error i'm receiving:

$ python bdgintobw.py

Traceback (most recent call last):

  File "bdgintobw.py", line 4, in <module>

    bw.addHeader([("chr1", 249250621), ("chr2", 243199373), ("chr3", 198022430), ("chr4", 191154276), ("chr5", 180915260), ("chr6", 171115067), ("chr7", 159138663), ("chr8", 146364022), ("chr9", 141213431), ("chr10", 135534747), ("chr11", 135006516), ("chr12", 133851895), ("chr13", 115169878), ("chr14", 107349540), ("chr15", 102531392), ("chr16", 90354753), ("chr17", 81195210), ("chr18", 78077248), ("chr19", 59128983), ("chr20", 63025520), ("chr21", 48129895), ("chr22", 51304566), ("chrX", 155270560), ("chrY", 59373566), ("chrM", 16571)], maxZooms=0)

AttributeError: 'NoneType' object has no attribute 'addHeader'
ADD REPLYlink 4.1 years ago
Sam
• 70
Entering edit mode
0

For whatever reason you couldn't open "Gro-Seq.bdg" for opening (are you using the "WriterIntegration" branch?). Also, I apparently introduced a bug recently where "maxZooms" doesn't work! I'll try to get that fixed today.

ADD REPLYlink 4.1 years ago
Devon Ryan
90k
Entering edit mode
0

maxZooms should now work as either a keyword or optional argument, though obviously if open() returns None then you're having a different issue.

ADD REPLYlink 4.0 years ago
Devon Ryan
90k
Entering edit mode
0

Hi, Is there any update about this post? Thanks. T.

ADD REPLYlink 2.8 years ago
morovatunc
• 400
0
Entering edit mode

This solution only works for non-Bookended bedgraph file
Sort and then merge overlapping regions(mean of score or max, sum, etc)

LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s in.bdg > out.bdg.tmp
bedtools merge -i out.bdg.tmp -c 4 -d 0 -o max > out.bdg
LC_COLLATE=C sort -k1,1 -k2,2n -k3,3n -s out.bdg > out.bdg.sorted
bedGraphToBigWig out.bdg.sorted hg38.genome out.bigwig
ADD COMMENTlink 7 months ago Simply Bioinformatics • 150

Login before adding your answer.

Powered by the version 1.8