Hello biostar followers,
I use normally "findOverlaps" function in R to select reads of my interest from my BAM files (reading in R with library GenomicAlignments). This times I received a weird results :
> single_read.gal
GAlignments object with 1 alignment and 5 metadata columns:
seqnames strand cigar qwidth start end width njunc | qname seq NH HI nM
<Rle> <Rle> <character> <integer> <integer> <integer> <integer> <integer> | <character> <DNAStringSet> <integer> <integer> <integer>
[1] chrY - 51M 51 42301566 42301616 51 0 | HISEQ:125:CAWV3ACXX:6:1107:18815:33205 TCCAGTCCTG...TGTAAGCCAA 1 1 1
-------
seqinfo: 66 sequences from an unspecified genome
> genomic_region.gr
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | GeneID
<Rle> <IRanges> <Rle> | <character>
[1] chrY 42301297-42301688 + | MTA_Mm
-------
seqinfo: 36 sequences from an unspecified genome; no seqlengths
So, the read "single_read.gal" is clearly within the GRanges interval ( 42301297 < 42301566 & 42301616 < 42301688 ), opposite strand, overlapping width = 51 (width of the read).
But, when I run the "findOverlaps" function ignoring the strand info, there is no overlap fulfilling the input requirements :
> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 25, ignore.strand = T )
Hits object with 0 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
-------
queryLength: 1 / subjectLength: 1
It works only with "minoverlap" parameter = 0
> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "within", minoverlap = 0, ignore.strand = T )
Hits object with 1 hit and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
-------
queryLength: 1 / subjectLength: 1
For any-type overlap works even the former setup :
> findOverlaps( query = single_read.gal, subject = genomic_region.gr, type = "any", minoverlap = 25, ignore.strand = T )
Hits object with 1 hit and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
-------
queryLength: 1 / subjectLength: 1
Is that a bug ? Thanks in advance for any respond
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4 GenomicAlignments_1.16.0 Rsamtools_1.32.3 Biostrings_2.48.0 XVector_0.20.0 SummarizedExperiment_1.10.1 DelayedArray_0.6.5
[8] BiocParallel_1.14.2 matrixStats_0.54.0 Biobase_2.40.0 GenomicRanges_1.32.6 GenomeInfoDb_1.16.0 IRanges_2.14.11 S4Vectors_0.18.3
[15] BiocGenerics_0.26.0 BiocInstaller_1.30.0
loaded via a namespace (and not attached):
[1] lattice_0.20-35 bitops_1.0-6 grid_3.5.1 zlibbioc_1.26.0 Matrix_1.2-14 tools_3.5.1 RCurl_1.95-4.11 compiler_3.5.1
[9] GenomeInfoDbData_1.1.0