Biostar Beta. Not for public use.
Question: How do I summarize a GRanges data frame into "one complete RLE"?
0
Entering edit mode

I have a GRanges data frame corresponding to a mappability track that looks like this:

m
GRanges with 31194271 ranges and 1 metadata column:
seqnames ranges strand | mappable
|
[1] 4 [5981, 5985] | FALSE
[2] 4 [5986, 5990]
| FALSE
[3] 4 [5991, 5995] | FALSE
[4] 4 [5996, 6000]
| FALSE
[5] 4 [6001, 6005] | FALSE
... ... ... ... ... ...
[31194267] dmel_mitochondrion_genome [19496, 19500]
| FALSE
[31194268] dmel_mitochondrion_genome [19501, 19505] | FALSE
[31194269] dmel_mitochondrion_genome [19506, 19510]
| FALSE
[31194270] dmel_mitochondrion_genome [19511, 19515] | FALSE
[31194271] dmel_mitochondrion_genome [19516, 19517]
| FALSE

How can I summarize the ranges so that, for example, the region of chromosome 4, Ranges 5981-6005 get summarized into one line of FALSEs?

ADD COMMENTlink 5.8 years ago kamaitachi • 0 • updated 5.8 years ago Devon Ryan 90k
4
Entering edit mode

reduce(m)

Edit : I guess you want the extra columns too. In that case it's a bit more complicated.

m2 <- reduce(m)
IDX <- findOverlaps(m, m2)
IDX2 <- IDX[which(!duplicated(subjectHits(IDX))),] #Just assign things once
mcols(m2)$mappable[subjectHits(IDX2)] <- mcols(m)$mappable[queryHits(IDX2)]

or something quite close to that.

ADD COMMENTlink 5.8 years ago Devon Ryan 90k
Entering edit mode
0

Of course. It's always a one-liner. Thanks very much! :)

ADD REPLYlink 5.8 years ago
kamaitachi
• 0
Entering edit mode
0

Note the update! I'd forgotten about the metadata columns, which you want to keep. There's no inbuilt way to get reduce() to keep those, so I just assign the first value of the original object. One could think of more complicated ways to do that, likely by splitting the output by subjectHits() and then applying a function.

ADD REPLYlink 5.8 years ago
Devon Ryan
90k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0