Hi all, I have a peak calling results stored in bed file. Some of the peaks are stored in one entry as different blocks. I want to split those blocks each into a separate entry using the information provided in blockCount, blockSizes, blockStarts.
For example, Input:
chr chromStart chromEnd name score strand thickStart thickEnd itemRgb blockCount blockSizes blockStarts
chr1 943253 943774 SAMD11 2e-59 + 943253 943774 0 2 124,77, 0,444
chr1 944351 944581 SAMD11 5e-35 + 944351 944581 0 1 230, 0
output:
chr1 943253 943377 SAMD11 2e-59 + 943253 943377 0 1 124, 0
chr1 943697 943774 SAMD11 2e-59 + 943697 943774 0 1 77, 0
chr1 944351 944581 SAMD11 5e-35 + 944351 944581 0 1 230, 0
Note, in the example, the first entry contains two peaks (blocks) and the second contains only one. In my data, some of my entries contain two or multiple (more than two) peaks (blocks) while others contain only one.
Can this be done by some tools or codes? Any help is appreciated!
Thanks
Scott
This looks like genepred format, I could find something to convert it to gtf which may be a step in the right direction.
It's bed12 format, for what it's worth.