Biostar Beta. Not for public use.
Testing bigWig file integrity
Entering edit mode
15 months ago
Seattle, WA USA

I am downloading hundreds of bigWig files in bulk.

To reduce the time it takes to do this, I am running multiple downloads in parallel, and file system updates are not always immediate, so it is very rarely possible for a download to be overwritten and corrupted, in the case where os.path.exists in the parent Python script I am using for downloads ends up returning a false negative.

Is there a way to test a bigWig file for its integrity without opening up the file with pyBigWig or bigWigToBedGraph etc. and reading its entire contents? Basically, something similar to samtools quickcheck for BAM?

I'm investigating if file hashes are available. I'd be curious to know about options that could be used to test the file integrity directly (if they exist).

bigwig integrity • 247 views
Entering edit mode
12 months ago
Freiburg, Germany

The unfortunate answer is "no". BigWig files lack a magic number at the end, which is one of the things that samtools quickcheck is looking at for BAM files. About the only thing that could be done is to write a program to read in the header and determine from it exactly how many bytes should be present in the file, which could then be compared against the actual file size. That'd guard against truncated files, but if portions were otherwise corrupted then it wouldn't guard against that.

If you can't find hashes to match against and would like a "at least ensure the file size is internally consistent" then let me know and I can code something up.

Entering edit mode

I do have hashes after all, so no need (but thanks). I was just curious if this format had any checksums or anything of that sort.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1