overlapping GRanges and GAlignments objects - "within" type results depend on "minoverlap" parameter
1
1
Entering edit mode
5.6 years ago
jpasulka ▴ 10

Hello biostar followers,

I use normally "findOverlaps" function in R to select reads of my interest from my BAM files (reading in R with library GenomicAlignments). This times I received a weird results :

> single_read.gal

GAlignments object with 1 alignment and 5 metadata columns:

  seqnames strand       cigar    qwidth     start       end     width     njunc |                                  qname                     seq        NH        HI        nM
     <Rle>  <Rle> <character> <integer> <integer> <integer> <integer> <integer> |                            <character>          <DNAStringSet> <integer> <integer> <integer>

[1]   chrY      -         51M        51  42301566  42301616        51         0 | HISEQ:125:CAWV3ACXX:6:1107:18815:33205 TCCAGTCCTG...TGTAAGCCAA         1         1         1
-------
seqinfo: 66 sequences from an unspecified genome

> genomic_region.gr

GRanges object with 1 range and 1 metadata column:

  seqnames            ranges strand |      GeneID
     <Rle>         <IRanges>  <Rle> | <character>
[1]   chrY 42301297-42301688      + |      MTA_Mm
-------
seqinfo: 36 sequences from an unspecified genome; no seqlengths

So, the read "single_read.gal" is clearly within the GRanges interval ( 42301297 < 42301566 & 42301616 < 42301688 ), opposite strand, overlapping width = 51 (width of the read).

But, when I run the "findOverlaps" function ignoring the strand info, there is no overlap fulfilling the input requirements :

> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )

Hits object with 0 hits and 0 metadata columns:
queryHits subjectHits
<integer>   <integer>
-------
queryLength: 1 / subjectLength: 1

It works only with "minoverlap" parameter = 0

> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 0, ignore.strand = T )

Hits object with 1 hit and 0 metadata columns:
  queryHits subjectHits
  <integer>   <integer>
[1]         1           1
-------
queryLength: 1 / subjectLength: 1

For any-type overlap works even the former setup :

> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "any", minoverlap = 25, ignore.strand = T )

Hits object with 1 hit and 0 metadata columns:
  queryHits subjectHits
  <integer>   <integer>
[1]         1           1
-------
queryLength: 1 / subjectLength: 1

Is that a bug ? Thanks in advance for any respond

> sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
[8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.11.4           GenomicAlignments_1.16.0    Rsamtools_1.32.3            Biostrings_2.48.0           XVector_0.20.0              SummarizedExperiment_1.10.1 DelayedArray_0.6.5         
 [8] BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.6        GenomeInfoDb_1.16.0         IRanges_2.14.11             S4Vectors_0.18.3           
[15] BiocGenerics_0.26.0         BiocInstaller_1.30.0       

loaded via a namespace (and not attached):
[1] lattice_0.20-35        bitops_1.0-6           grid_3.5.1             zlibbioc_1.26.0        Matrix_1.2-14          tools_3.5.1            RCurl_1.95-4.11        compiler_3.5.1        
[9] GenomeInfoDbData_1.1.0
R findOverlaps • 2.1k views
ADD COMMENT
3
Entering edit mode
5.6 years ago

Seems like a possible bug but also looks to be related to the classes used. If you convert the GAlignments object to GRanges, it works as expected.

## setting up your same objects
genomic_region.gr <- GRanges("chrY:42301297-42301688:+")
single_read.gal <- as(GRanges("chrY:42301566-42301616:-"),'GAlignments')

## test as is
findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )

Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 1 / subjectLength: 1

## test after changing class
single_read.gal <- as(single_read.gal,'GRanges')

findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )

Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  -------
  queryLength: 1 / subjectLength: 1

You might get a better explanation in the Bioconductor forums.

ADD COMMENT

Login before adding your answer.

Traffic: 3293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6