How do I summarize a GRanges data frame into "one complete RLE"?
1
0
Entering edit mode
9.9 years ago
kamaitachi • 0

I have a GRanges data frame corresponding to a mappability track that looks like this:

> m
GRanges with 31194271 ranges and 1 metadata column:
                              seqnames         ranges strand   | mappable
                                 <Rle>      <IRanges>  <Rle>   |    <Rle>
         [1]                         4   [5981, 5985]      *   |    FALSE
         [2]                         4   [5986, 5990]      *   |    FALSE
         [3]                         4   [5991, 5995]      *   |    FALSE
         [4]                         4   [5996, 6000]      *   |    FALSE
         [5]                         4   [6001, 6005]      *   |    FALSE
         ...                       ...            ...    ... ...      ...
  [31194267] dmel_mitochondrion_genome [19496, 19500]      *   |    FALSE
  [31194268] dmel_mitochondrion_genome [19501, 19505]      *   |    FALSE
  [31194269] dmel_mitochondrion_genome [19506, 19510]      *   |    FALSE
  [31194270] dmel_mitochondrion_genome [19511, 19515]      *   |    FALSE
  [31194271] dmel_mitochondrion_genome [19516, 19517]      *   |    FALSE

How can I summarize the ranges so that, for example, the region of chromosome 4, Ranges 5981-6005 get summarized into one line of FALSEs?

RLE GRanges GenomicRanges R • 3.2k views
ADD COMMENT
4
Entering edit mode
9.9 years ago

reduce(m)

Edit: I guess you want the extra columns too. In that case it's a bit more complicated.

m2 <- reduce(m)
IDX <- findOverlaps(m, m2)
IDX2 <- IDX[which(!duplicated(subjectHits(IDX))),] #Just assign things once
mcols(m2)$mappable[subjectHits(IDX2)] <- mcols(m)$mappable[queryHits(IDX2)]

or something quite close to that.

ADD COMMENT
0
Entering edit mode

Of course. It's always a one-liner. Thanks very much! :)

ADD REPLY
0
Entering edit mode

Note the update! I'd forgotten about the metadata columns, which you want to keep. There's no inbuilt way to get reduce() to keep those, so I just assign the first value of the original object. One could think of more complicated ways to do that, likely by splitting the output by subjectHits() and then applying a function.

ADD REPLY

Login before adding your answer.

Traffic: 2605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6