Accessing start positions in a strand-specific manner from a GRanges object
1
5
Entering edit mode
9.0 years ago

Hi all:

I have a little bit of experience working with GRanges objects in R (from the GenomicRanges package in Bioconductor), but I keep running into a subsetting case that should be more straightforward than the solution I'm using.

Let's say I have the following GRanges object (using the example from the reference):

library(GenomicRanges)

gr2 <- GRanges(seqnames = c("chr1", "chr1"), 
ranges = IRanges(c(7,13), width = 3), strand = c("+", "-")) #sample GRanges object

...which looks like this:

gr2

GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      +
  [2]     chr1  [13, 15]      -
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

I'd like to access all start positions in this object in a strand-specific manner, where I define the "start" to be first value in the IRanges interval if it's on the plus strand, and the second value if it's the negative strand. Of course, the behavior of the subsetting methods start() and end() are both agnostic to strand, grabbing all values in the first or second value of the interval, respectively.

For example:

start(gr2)
[1]  7 13

and

end(gr2)
[1]  9 15

My current work-around (which is ugly) looks something like the following:

which(strand(gr2)=="+") -> plus.i #which intervals are on the positive strand?
start(gr2[plus.i]) #getting the strand-specific 'start' from those intervals

[1] 7

which(strand(gr2)=="-") -> minus.i #which intervals are on the negative strand?
end(gr2[minus.i]) #getting the strand-specific 'start' from those intervals

[1] 15

I then concatenate both sets of vectors using c().

There must an easier, more GRanges-centric approach to access these strand-specific 'starts'. Can anyone point me in the right direction? The real-world application case I'm dealing with are alignments of mapped, strand-specific CAGE tags. The 5' ends of an interval represents the TSS.

Thanks in advance,
Taylor

Bioconductor R GenomicRanges • 5.2k views
ADD COMMENT
6
Entering edit mode
7.5 years ago

Easier would be to use start(resize(gr2, 1)).

The resize(gr2, width = 1) honors strand specific behavior, and setting a width of 1 makes both the "start" and "end" equal to the strand-specific start position.

ADD COMMENT

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6