I am downloading hundreds of bigWig files in bulk.
To reduce the time it takes to do this, I am running multiple downloads in parallel, and file system updates are not always immediate, so it is very rarely possible for a download to be overwritten and corrupted, in the case where os.path.exists in the parent Python script I am using for downloads ends up returning a false negative.
Is there a way to test a bigWig file for its integrity without opening up the file with pyBigWig or bigWigToBedGraph etc. and reading its entire contents? Basically, something similar to samtools quickcheck for BAM?
I'm investigating if file hashes are available. I'd be curious to know about options that could be used to test the file integrity directly (if they exist).
The unfortunate answer is "no". BigWig files lack a magic number at the end, which is one of the things that samtools quickcheck is looking at for BAM files. About the only thing that could be done is to write a program to read in the header and determine from it exactly how many bytes should be present in the file, which could then be compared against the actual file size. That'd guard against truncated files, but if portions were otherwise corrupted then it wouldn't guard against that.
If you can't find hashes to match against and would like a "at least ensure the file size is internally consistent" then let me know and I can code something up.