Computing Relative Coordinates Of One Bioconductor Granges Object To Another?
1
2
Entering edit mode
12.0 years ago
Ryan Thompson ★ 3.6k

I have a set of mapped reads as a GRanges object in BioConductor, and I am grouping them into "clusters" of overlapping reads (regions of contiguous nonzero coverage). I also have the cluster coordinates as a GRanges object, obtained by calling reduce on an unstranded version of the reads, and then arbitrarily assigning a strand to each cluster (actually I have a method for assigning a specific strand to each, but it's not important).

library(ShortRead)
bamfile <- file.path(system.file("extdata", package="Rsamtools"), "ex1.bam")
read.ranges <- as(readAligned(bamfile, type="BAM"), "GRanges")
cluster.ranges <- reduce(GRanges(seqnames=seqnames(read.ranges),
                                 ranges=ranges(read.ranges),
                                 ## Don't consider strand when merging
                                 ## reads into clusters
                                 strand="*",
                                 seqlengths=seqlengths(read.ranges)))
## Each cluster has a strand, assigned via an unspecified method
strand(cluster.ranges) <- c("+", "-")
## Assign an arbitrary ID to each cluster
elementMetadata(cluster.ranges)$id <- sprintf("cluster%s", seq(length(cluster.ranges)))

So basically, I have two GRanges objects, read.ranges and cluster.ranges, and every range in the first obejct is contained in exactly one range in the second object:

> read.ranges
GRanges with 3242 ranges and 2 elementMetadata cols:
         seqnames       ranges strand   |                       id      flag
            <Rle>    <IRanges>  <Rle>   |             <BStringSet> <integer>
     [1]     seq1     [ 1, 36]      +   |      B7_591:4:96:693:509        73
     [2]     seq1     [ 3, 37]      +   |   EAS54_65:7:152:368:113        73
     [3]     seq1     [ 5, 39]      +   |      EAS51_64:8:5:734:57       137
     [4]     seq1     [ 6, 41]      +   |     B7_591:1:289:587:906       137
     [5]     seq1     [ 9, 43]      +   |    EAS56_59:8:38:671:758       137
     [6]     seq1     [13, 47]      +   |    EAS56_61:6:18:467:281        73
     [7]     seq1     [13, 48]      +   |  EAS114_28:5:296:340:699       137
     [8]     seq1     [15, 49]      +   |     B7_597:6:194:894:408        73
     [9]     seq1     [18, 52]      -   |    EAS188_4:8:12:628:973        89
     ...      ...          ...    ... ...                      ...       ...
  [3234]     seq2 [1520, 1554]      +   |  EAS114_45:6:86:859:1779       137
  [3235]     seq2 [1523, 1555]      -   |   EAS54_71:8:105:854:975        83
  [3236]     seq2 [1524, 1558]      -   |   EAS51_62:4:187:907:145       153
  [3237]     seq2 [1524, 1557]      +   |   EAS54_71:4:284:269:882        73
  [3238]     seq2 [1524, 1558]      +   |     EAS56_63:4:141:9:811       137
 [ reached getOption("max.print") -- omitted 4 rows ]
  ---
  seqlengths:
   seq1 seq2
     NA   NA
> cluster.ranges
GRanges with 2 ranges and 1 elementMetadata col:
      seqnames    ranges strand |          id
         <Rle> <IRanges>  <Rle> | <character>
  [1]     seq1 [1, 1569]      + |    cluster1
  [2]     seq2 [1, 1567]      - |    cluster2
  ---
  seqlengths:
   seq1 seq2
     NA   NA

From this, I would like to transform the coordinates of the reads into coordinates relative to the clusters to which they belong. This includes inverting the strand and reversing coordinates of reads within a cluster whose strand is "-", Is there a function to do this?

bioconductor coordinates • 3.4k views
ADD COMMENT
2
Entering edit mode
12.0 years ago
Malcolm.Cook ★ 1.5k

I don't think there is a function to do exactly that.

If I understand the specifics of your strand considerations, the following should be close

# find which cluster each range is within, ignoring strand
hits<-cluster.ranges[findOverlaps(read.ranges,cluster.ranges,select='first',ignore.strand=TRUE)]
# start with the read.ranges themselves
read.ranges.on.hits<-read.ranges
# update each range relative to its cluster hit
ranges(read.ranges.on.hits)<-ranges(narrow(hits,start(read.ranges),end(read.ranges)))
# identify which hits are to clusters on negative strand
neg<-as.vector(strand(hits)=='-')
# reflect the coordinates and the strand of the reads in neg strand clusters
ranges(read.ranges.on.hits)[neg]<-reflect(ranges(read.ranges.on.hits[neg]),ranges(hits[neg]))
strand(read.ranges.on.hits)[neg]<-ifelse('-'==strand(read.ranges.on.hits)[neg],'+','-')

which yields

> read.ranges.on.hits
GRanges with 3242 ranges and 2 elementMetadata values:
         seqnames    ranges strand   |                       id      flag
            <Rle> <IRanges>  <Rle>   |             <BStringSet> <integer>
     [1]     seq1  [ 1, 36]      +   |      B7_591:4:96:693:509        73
     [2]     seq1  [ 3, 37]      +   |   EAS54_65:7:152:368:113        73
     [3]     seq1  [ 5, 39]      +   |      EAS51_64:8:5:734:57       137
     [4]     seq1  [ 6, 41]      +   |     B7_591:1:289:587:906       137
     [5]     seq1  [ 9, 43]      +   |    EAS56_59:8:38:671:758       137
     [6]     seq1  [13, 47]      +   |    EAS56_61:6:18:467:281        73
     [7]     seq1  [13, 48]      +   |  EAS114_28:5:296:340:699       137
     [8]     seq1  [15, 49]      +   |     B7_597:6:194:894:408        73
     [9]     seq1  [18, 52]      -   |    EAS188_4:8:12:628:973        89
     ...      ...       ...    ... ...                      ...       ...
  [3234]     seq2  [14, 48]      -   |  EAS114_45:6:86:859:1779       137
  [3235]     seq2  [13, 45]      +   |   EAS54_71:8:105:854:975        83
  [3236]     seq2  [10, 44]      +   |   EAS51_62:4:187:907:145       153
  [3237]     seq2  [11, 44]      -   |   EAS54_71:4:284:269:882        73
  [3238]     seq2  [10, 44]      -   |     EAS56_63:4:141:9:811       137
  [3239]     seq2  [10, 44]      -   |  EAS114_30:6:277:397:932        73
  [3240]     seq2  [ 6, 40]      +   | EAS139_11:7:50:1229:1313        83
  [3241]     seq2  [ 2, 36]      +   |    EAS54_65:3:320:20:250       147
  [3242]     seq2  [ 1, 35]      +   |    EAS114_26:7:37:79:581        83
  ---
  seqlengths:
   seq1 seq2
     NA   NA
ADD COMMENT
0
Entering edit mode

Oops, I forgot to mention the part about giving each cluster a unique ID and then using that as the seqnames of the new GRanges object, but that's easy to do myself.

ADD REPLY
0
Entering edit mode

It looks to me like you already have already given each cluster a unique ID.

To use it as the seqnames of the result, just initialize read.ranges.on.hits instead with:

 read.ranges.on.hits<-GRanges(values(hits)$id,ranges(read.ranges),strand(read.ranges))
ADD REPLY
0
Entering edit mode

Oh yeah, I guess I did.

ADD REPLY
0
Entering edit mode

ok - so - do you accept the answer as complete? if so, then click the green arrow! Cheers, malcolm.

ADD REPLY

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6