Do wiggle files have to be sorted?
4
0
Entering edit mode
8.3 years ago
asperlea • 0

I am working on a project in which I end up generating a score for every position in the human genome, and I would like to be able to visualize these scores in the UCSC genome browser. From my understanding, the best format for this is .wig, but due to some implementation details, it is a lot easier to generate scores in chunks that aren't sorted by their position in the genome. Do wiggle files have to be sorted? Or is it that as long as the variableStep/fixedStep flag correctly describes the bases below it, the order doesn't matter?

To be more specific, would it be allowed to have something like this in a wiggle file?

variableStep chrom=chr2 span=5
300701 12.5
variableStep chrom=chr2 span=10
100701 22.5

I wasn't able to find any information in the file format descriptions from UCSC.

genome browser wiggle custom track genome • 2.6k views
ADD COMMENT
2
Entering edit mode
8.3 years ago

You can use BEDOPS wig2bed to generate a sorted BED file, and then import that into the UCSC browser or your local browser instance:

$ wig2bed < foo.wig > foo.bed

This app handles both variable- and fixed-step WIG input.

ADD COMMENT
1
Entering edit mode
8.3 years ago
ablanchetcohen ★ 1.2k

I'm able to generate the bigWig file from the unsorted wig file, although I haven't tested it in the UCSC Genome Browser.

I'm not a fan of the wiggle format anyway, and have long abandoned it.

I believe everyone should switch to the bed graph format, as many already have.

The bedgraph format is much more straightforward, and can explained to a 5 year old child.

The wiggle format can be remarkably frustrating to work with.

The bedgraph file must be sorted for viewing, but sorting a file in the bedgraph format is a trivial task.

The conversion of the bedgraph file to the binary bigWig format using the bedGraphToBigWig program is also remarkably efficient.

Many other operations can be performed painlessly with the bedtools suite, or even a simple script.

You can rip your hair out trying to work with the wiggle format.

ADD COMMENT
0
Entering edit mode
8.3 years ago
asperlea • 0

Thank you for your answer. From my understanding, the wiggle format is supposed to be better for really dense data. I have a score for almost every position in the genome, so would you recommend bedgraph even for that?

ADD COMMENT
0
Entering edit mode

This depends on which of the wiggle encodings one uses. Wiggle and bedGraph can be essentially identical, with the former just lacking the chromosome column on every line. In general, bigWig is preferred, since it allows efficient random access and statistical operations.

ADD REPLY
0
Entering edit mode

Yes, I use bedgraph for absolutely everything.

There are some very efficient programs written in C++ to handle bedgraph files.

As Devon Ryan mentionned, for viewing, bigWig is the recommended format for viewing.

Again, the bedGraphToBigWig program is much more efficient than the wigToBigWig program.

I'm admittedly not an expert in the wiggle format, since I haven't used it the past 2 years.

I just remember having an extremely hard time writing a simple C++ program to do some basic manipulation on the Wiggle format.

Now I use bedtools all the time, or basic Python scripts to manipulate my bed graph.

I convert them to bigWig files for viewing, since the bigWig files are smaller, and indexed.

ADD REPLY
0
Entering edit mode
8.3 years ago

An unsorted wiggle file is effectively useless. BTW, you can't create an unsorted bigWig file, there's no such thing (well, you can do it, but you won't typically be able to use it).

ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6