Biostar Beta. Not for public use.
Question: GRanges intersect operation
0
Entering edit mode

I am trying to find the intersection of two GRanges objects. The first object has the coordinates for genes (grgenes below), the second mimics my data (grtest below). However, when I try to do the intersection, I appear to get an empty set. What am I doing wrong? (Sorry for the bad formatting! Can I turn the 'auto-format' off??)

grgenes

GRanges object with 23056 ranges and 1 metadata column:

    seqnames                 ranges strand   |       GENEID
       <Rle>              <IRanges>  <Rle>   | <FactorList>
  1    chr19 [ 58858172,  58874214]      -   |            1
 10     chr8 [ 18248755,  18258723]      +   |           10
100    chr20 [ 43248163,  43280376]      -   |          100

seqinfo: 93 sequences (1 circular) from hg19 genome

grtest <- GRanges(seqnames = "chr19",ranges=IRanges(start=c(58857500,58857505),width=1))

grtest

GRanges object with 2 ranges and 0 metadata columns:

  seqnames               ranges strand
     <Rle>            <IRanges>  <Rle>

[1] chr19 [58857500, 58857500] *

[2] chr19 [58857505, 58857505] *


seqinfo: 1 sequence from an unspecified genome; no seqlengths

intersect(grgenes,grtest)

GRanges object with 0 ranges and 0 metadata columns:

seqnames ranges strand <rle> <iranges> <rle>


seqinfo: 93 sequences (1 circular) from hg19 genome

ADD COMMENTlink 3.6 years ago bsmith030465 • 140
1
Entering edit mode

Since grgene contains strand information, you need to either add ignore.strand = TRUE , or set the strandedness of your test data. You also need to ensure that your test data actually overlaps with your intiial data ([58857500, ..] doesn't overlap [58858172, 58874214]). I've added a couple of example GRanges that should definitely overlap your grgene ranges in the following:

grgene <- GRanges(
    seqnames = c('chr19', 'chr8', 'chr20'),
    ranges = IRanges(start = c(58858172, 18248755, 43248163),
                       end = c(58874214, 18258723, 43280376)),
    strand = c('-', '+', '-')
    ) # ignoring the GENEID column

grtest.unstrand <- GRanges(
    seqnames = "chr19",
    ranges=IRanges(
        start=c(58857500,58857505,58858500,58859505), # note the difference
        width=1))

grtest.strand <- GRanges(
    seqnames = "chr19",
    ranges=IRanges(
        start=c(58857500,58857505,58858500,58859505),
        width=1),
    strand = c('-', '+', '-', '+'))

> GenomicRanges::intersect(grgene, grtest.unstrand)
GRanges object with 0 ranges and 0 metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

> GenomicRanges::intersect(grgene, grtest.unstrand, ignore.strand = TRUE)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]    chr19 [58858500, 58858500]      *
  [2]    chr19 [58859505, 58859505]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

> GenomicRanges::intersect(grgene, grtest.strand)
GRanges object with 1 range and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]    chr19 [58858500, 58858500]      -
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

> GenomicRanges::intersect(grgene, grtest.strand, ignore.strand = TRUE)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames               ranges strand
         <Rle>            <IRanges>  <Rle>
  [1]    chr19 [58858500, 58858500]      *
  [2]    chr19 [58859505, 58859505]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

HTH

ADD COMMENTlink 3.6 years ago russhh ♦ 4.4k
0
Entering edit mode

Thanks!

Also, how can I get the meta-data values to show in the intersect results. For example, with my original grgenes object:

intersect(grgenes,grtest.unstrand,ignore.strand=TRUE)

GRanges object with 2 ranges and 0 metadata columns:

  seqnames               ranges strand
     <Rle>            <IRanges>  <Rle>

[1] chr19 [58858500, 58858500] *

[2] chr19 [58859505, 58859505] *


seqinfo: 93 sequences (1 circular) from hg19 genome

How can I get the gene ids to show in the meta-data column?

ADD COMMENTlink 3.6 years ago bsmith030465 • 140
Entering edit mode
0

Unfortunately, that won't be possible in GenomicRanges::intersect, since the metadata for a supersequence doesn't necessarily correspond to a subsequence. You can however, use findOverlaps to identify rows within your grgenes object that are overlapped by at least one entry of grtest, and then extract those rows. Look at the documentation

ADD REPLYlink 3.6 years ago
russhh
♦ 4.4k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0