Question

UCSC Genome Browser binning

1

Entering edit mode

10.0 years ago

Alex Reynolds 35k

I am reading The Human Genome Browser at UCSC by Kent, et al.

In Figure 7, they discuss a binning approach to search for elements which overlap or are contained in a given range:

From the paper:

Features are put in the smallest bin in which they fit. A feature covering the range indicated by line A would go in bin 1. Similarly, line B goes in bin 4 and line C in bin 20. When the browser needs to access features in a region, it must look in bins of all different sizes. To access all the features that overlapped or were enclosed by line A, the browser looks in bins 1, 2, 3, 7, 8, 9, 10, and 11. For B the browser looks in bins 1, 4, 14, 15, 16, 17. For C, the browser looks in bins 1, 5, and 20.

The bins are sized in the following manner: 128 kb, 1 Mb, 8 Mb, 64 Mb, and 512 Mb.

My guess is that references to each element/feature (not the element itself) are stored in the smallest bin that will contain it. If that's correct, is there any compression done to the list of references?

Further, how is binning done for arbitrary elements or search queries larger than the largest bin size? For example, let's say I provide input elements that were the extents of arbitrary chromosomes longer than 512 Mb, which would not be impossible for some organisms. How are searches then performed for elements of that type? Is there an ordered list of adjacent bin structures to span the minimum and maximum range over the entire input element set?

Are elements/features (element/feature references) retrieved in sort order? (Are bins queried in a certain order?)

computation algorithm binning ucsc • 3.2k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by Alex Reynolds 35k

Ram · Answer 1 · 2014-06-03

1

Entering edit mode

9.9 years ago

Pierre Lindenbaum 161k

provide input elements that were the extents of arbitrary chromosomes longer than 512 Mb

as far as I know, there are two versions of the binning system: http://genomewiki.ucsc.edu/index.php/Bin_indexing_system (Extended implementation). I only saw the first "initial" implementation:

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Pierre Lindenbaum 161k