Biostar Beta. Not for public use.
Question: Do wiggle files have to be sorted?
Entering edit mode

I am working on a project in which I end up generating a score for every position in the human genome, and I would like to be able to visualize these scores in the UCSC genome browser. From my understanding, the best format for this is .wig, but due to some implementation details, it is a lot easier to generate scores in chunks that aren't sorted by their position in the genome. Do wiggle files have to be sorted? Or is it that as long as the variableStep/fixedStep flag correctly describes the bases below it, the order doesn't matter?

To be more specific, would it be allowed to have something like this in a wiggle file?

variableStep chrom=chr2 span=5
300701 12.5

variableStep chrom=chr2 span=10
100701 22.5

I wasn't able to find any information in the file format descriptions from UCSC.

ADD COMMENTlink 4.1 years ago asperlea • 0 • updated 4.1 years ago Alex Reynolds 28k
Entering edit mode

You can use BEDOPS wig2bed to generate a sorted BED file, and then import that into the UCSC browser or your local browser instance:

$ wig2bed < foo.wig > foo.bed

This app handles both variable- and fixed-step WIG input.

ADD COMMENTlink 4.1 years ago Alex Reynolds 28k
Entering edit mode

I'm able to generate the bigWig file from the unsorted wig file, although I haven't tested it in the UCSC Genome Browser.

I'm not a fan of the wiggle format anyway, and have long abandoned it.
I believe everyone should switch to the bed graph format, as many already have.
The bedgraph format is much more straightforward, and can explained to a 5 year old child.

The wiggle format can be remarkably frustrating to work with.
The bedgraph file must be sorted for viewing, but sorting a file in the bedgraph format is a trivial task.
The conversion of the bedgraph file to the binary bigWig format using the bedGraphToBigWig program is also remarkably efficient.
Many other operations can be performed painlessly with the bedtools suite, or even a simple script.
You can rip your hair out trying to work with the wiggle format.

ADD COMMENTlink 4.1 years ago ablanchetcohen ♦ 1.2k
Entering edit mode

Thank you for your answer. From my understanding, the wiggle format is supposed to be better for really dense data. I have a score for almost every position in the genome, so would you recommend bedgraph even for that?

ADD COMMENTlink 4.1 years ago asperlea • 0
Entering edit mode

This depends on which of the wiggle encodings one uses. Wiggle and bedGraph can be essentially identical, with the former just lacking the chromosome column on every line. In general, bigWig is preferred, since it allows efficient random access and statistical operations.

ADD REPLYlink 4.1 years ago
Devon Ryan
Entering edit mode

Yes, I use bedgraph for absolutely everything.
There are some very efficient programs written in C++ to handle bedgraph files.

As Devon Ryan mentionned, for viewing, bigWig is the recommended format for viewing.
Again, the bedGraphToBigWig program is much more efficient than the wigToBigWig program.

I'm admittedly not an expert in the wiggle format, since I haven't used it the past 2 years.
I just remember having an extremely hard time writing a simple C++ program to do some basic manipulation on the Wiggle format.
Now I use bedtools all the time, or basic Python scripts to manipulate my bed graph.
I convert them to bigWig files for viewing, since the bigWig files are smaller, and indexed.

ADD REPLYlink 4.1 years ago
♦ 1.2k
Entering edit mode

An unsorted wiggle file is effectively useless. BTW, you can't create an unsorted bigWig file, there's no such thing (well, you can do it, but you won't typically be able to use it).

ADD COMMENTlink 4.1 years ago Devon Ryan 90k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0