How to tell if a BigWig file is 1-based or 0-based?
1
1
Entering edit mode
4.9 years ago

The BigWig documentation on the UCSC website says the following:

BigWig files created from bedGraph format use "0-start, half-open" coordinates, but bigWigs that represent variableStep and fixedStep data are generated from wiggle files that use "1-start, fully-closed" coordinates. For example, for a chromosome of length N, the first position is 1 and the last position is N.

But if I download a file, I might not know what co-ordinate system it is encoded with. I'm using bx-python to access the data in the bigWig files, but I can't work out if its returning 1-based or 0-based coordinates. Is there a way to tell? Genome browers must be able to tell the difference.

bigwig coordinate systems • 3.3k views
ADD COMMENT
2
Entering edit mode
4.8 years ago

I had asked UCSC a similar question some years ago, and their answer suggests to look at the header information, or to find the provenance of the data going into creation of the original bigWig:

The original data that is used to generate a bigWig can come from different formats. There is bedGraph, which is zero-relative, and wiggle, which is 1-relative. In summary, if a bedGraph is used, the results from bigWigToWig will be the bedGraph zero-relative coordinates. What will be included in the output is a commented note, for example, "#bedGraph section chr1:10451-568419" at the head of the wgEncodeSydhTfbsK562Pol3StdSig file mentioned. Thus, the data is not re-indexed, unless you specify bigWigToBedGraph, then data will always return as 0-based bedGraph.

Most ENCODE data, such as the information you were looking at, originated from a bam, that was processed through a step like bamToBedfile.bam -> file.bedGraph bedGraphToBigWig -> file.bw Thus, there is no problem with this file, it should be what you see when looking at most bam originated bigWig files from the ENCODE project.

As to your last question, it is best to not rely on the fact all bigWigs will be indexed the same, some will be from bedGraphs, some from wigs, depending on their originating files, but likely all ENCODE data will exit bigWigToWig as bedGraphs since they were likely encoded as bedGraphs from bams.

Here is further background information. There are two bigWig encoders, bedGraphToBigWig and wigToBigWig, that can take bedGraph or the two wiggle types, variableStep and fixedStep. Then there are two ways back: bigWigToBedGraph and bigWigToWig. If you wish to explore with these formats, please see these pages, the last being the location for obtaining precompiled binaries:

ADD COMMENT
0
Entering edit mode

Thank you for this information, I have this exact same problem. Do you know of a tool to access the header of a bigWig file?

ADD REPLY
1
Entering edit mode

Using Devon Ryan's Python library may help ( https://github.com/deeptools/pyBigWig ). Once installed:

$ python
>>> import pyBigWig
>>> bw = pyBigWig.open("my.bigWig")
>>> print(bw.header())
ADD REPLY
0
Entering edit mode

Thanks you! does this require loading the whole file? (in this step bw = pyBigWig.open("my.bigWig"), sorry for the question, I have no experience in python) I was looking for something like samtools view to pipe to head, but for bigWig, so that I can avoid loading the file

ADD REPLY
1
Entering edit mode

Not sure about the answer to your first question, but the second seems straightforward. Create a text file called readBigWigHeader.py and add the following code or similar:

#!/usr/bin/env python
import sys, pyBigWig
fn = sys.argv[1]
bw = pyBigWig.open(fn)
sys.stdout.write("{}\n".format(bw.header()))

Make the script executable (chmod +x ./readBigWigHeader.py), then run it like so to get the header sent to the standard output stream:

$ ./readBigWigHeader.py my.bigWig
...
ADD REPLY
0
Entering edit mode

Ok, thank you for the comprehensive help

ADD REPLY
1
Entering edit mode

It doesn't read the whole file in, it just reads in the parts needed like samtools. Please note that there's nothing in the header that indicates whether the underlying data is 1 or 0-based. This can actually change per-chunk within a bigWig file so there's really nothing to look at to know. As a general rule of thumb, it's best to assume that bigWig files are 0-based, since 1-based bigWig files are a terrible idea that should never have been allowed.

ADD REPLY

Login before adding your answer.

Traffic: 3149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6