How to show all columns after performing GenomicRanges::subsetByOverlaps
1
2
Entering edit mode
6.4 years ago
gundalav ▴ 380

I have the following two GenomicRanges objects. First one gr1 is this:


library(GenomicRanges)
set.seed(1)
gr1 <- GRanges(
        seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
        ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
        strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
        motif_score=seq(1, 0, length=10),
         motif_name=paste0("Motif_", toupper(sample(c(letters,letters))))[1:10]
      )
gr1
#> GRanges object with 10 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   a     chr1  [ 1, 10]      - |   1.0000000     Motif_N
#>   b     chr2  [ 2, 10]      + |   0.8888889     Motif_S
#>   c     chr2  [ 3, 10]      + |   0.7777778     Motif_C
#>   d     chr2  [ 4, 10]      * |   0.6666667     Motif_S
#>   e     chr1  [ 5, 10]      * |   0.5555556     Motif_J
#>   f     chr1  [ 6, 10]      + |   0.4444444     Motif_Q
#>   g     chr3  [ 7, 10]      + |   0.3333333     Motif_R
#>   h     chr3  [ 8, 10]      + |   0.2222222     Motif_D
#>   i     chr3  [ 9, 10]      - |   0.1111111     Motif_B
#>   j     chr3  [10, 10]      - |   0.0000000     Motif_C
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

And the second object gr2:

gr2 <- GRanges(seqnames="chr2", 
               ranges=IRanges(4:3, 6),
               peak_name=c("peak_1", "peak_2"),
               strand="+", peak_score=5:4)
gr2
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames    ranges strand |   peak_name peak_score
#>          <Rle> <IRanges>  <Rle> | <character>  <integer>
#>   [1]     chr2    [4, 6]      + |      peak_1          5
#>   [2]     chr2    [3, 6]      + |      peak_2          4
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Then I perform the region overlap using subsetByOverlaps between gr1 and gr2

 subsetByOverlaps(gr1, gr2)
#> GRanges object with 3 ranges and 2 metadata columns:
#>     seqnames    ranges strand | motif_score  motif_name
#>        <Rle> <IRanges>  <Rle> |   <numeric> <character>
#>   b     chr2   [2, 10]      + |   0.8888889     Motif_S
#>   c     chr2   [3, 10]      + |   0.7777778     Motif_C
#>   d     chr2   [4, 10]      * |   0.6666667     Motif_S
#>   -------
#>   seqinfo: 3 sequences from an unspecified genome; no seqlengths

as you can see the peak_name and peak_score column does not appear after intersection. How can I show them all?

R bioconductor genomicrange gwas • 3.9k views
ADD COMMENT
1
Entering edit mode
6.4 years ago

The subsetByOverlaps method finds the subset of the first argument that overlaps the second. It does not do a "merge" or anything like that, so columns in the first argument are the only ones returned. If you swap gr1 and gr2, you'll get the subset of gr2 that overlaps gr1 and that will include the peak_score and peak_name columns (but not the motif_score or motif_name columns).

If, as I suspect, you want to know what regions overlap with what other regions, see the findOverlaps method.

ADD COMMENT

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6