Testing bigWig file integrity
1
0
Entering edit mode
5.6 years ago

I am downloading hundreds of bigWig files in bulk.

To reduce the time it takes to do this, I am running multiple downloads in parallel, and file system updates are not always immediate, so it is very rarely possible for a download to be overwritten and corrupted, in the case where os.path.exists in the parent Python script I am using for downloads ends up returning a false negative.

Is there a way to test a bigWig file for its integrity without opening up the file with pyBigWig or bigWigToBedGraph etc. and reading its entire contents? Basically, something similar to samtools quickcheck for BAM?

I'm investigating if file hashes are available. I'd be curious to know about options that could be used to test the file integrity directly (if they exist).

bigwig integrity • 1.5k views
ADD COMMENT
1
Entering edit mode
5.6 years ago

The unfortunate answer is "no". BigWig files lack a magic number at the end, which is one of the things that samtools quickcheck is looking at for BAM files. About the only thing that could be done is to write a program to read in the header and determine from it exactly how many bytes should be present in the file, which could then be compared against the actual file size. That'd guard against truncated files, but if portions were otherwise corrupted then it wouldn't guard against that.

If you can't find hashes to match against and would like a "at least ensure the file size is internally consistent" then let me know and I can code something up.

ADD COMMENT
0
Entering edit mode

I do have hashes after all, so no need (but thanks). I was just curious if this format had any checksums or anything of that sort.

ADD REPLY

Login before adding your answer.

Traffic: 2782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6