I am working on a project in which I end up generating a score for every position in the human genome, and I would like to be able to visualize these scores in the UCSC genome browser. From my understanding, the best format for this is .wig, but due to some implementation details, it is a lot easier to generate scores in chunks that aren't sorted by their position in the genome. Do wiggle files have to be sorted? Or is it that as long as the variableStep/fixedStep flag correctly describes the bases below it, the order doesn't matter?
To be more specific, would it be allowed to have something like this in a wiggle file?
variableStep chrom=chr2 span=5 300701 12.5
variableStep chrom=chr2 span=10 100701 22.5
I wasn't able to find any information in the file format descriptions from UCSC.
This depends on which of the wiggle encodings one uses. Wiggle and bedGraph can be essentially identical, with the former just lacking the chromosome column on every line. In general, bigWig is preferred, since it allows efficient random access and statistical operations.
Yes, I use bedgraph for absolutely everything.
There are some very efficient programs written in C++ to handle bedgraph files.
As Devon Ryan mentionned, for viewing, bigWig is the recommended format for viewing.
Again, the bedGraphToBigWig program is much more efficient than the wigToBigWig program.
I'm admittedly not an expert in the wiggle format, since I haven't used it the past 2 years.
I just remember having an extremely hard time writing a simple C++ program to do some basic manipulation on the Wiggle format.
Now I use bedtools all the time, or basic Python scripts to manipulate my bed graph.
I convert them to bigWig files for viewing, since the bigWig files are smaller, and indexed.