Iranges Basic Binning
1
1
Entering edit mode
12.1 years ago

Greetings all,

I am no stranger to R, but I will admit that bioconductor always throws me for a loop. I am trying to do a sliding mean or a binned mean (lets say 1Kb) across all scaffolds in my data. Since i am working on SNV data only start = stop. Could I construct my data format differently and bin from the start?

head of data:

        CHRM  POS Q_FREQ_POP1 Q_FREQ_POP2  FREQ_DIFF  HET_POP1  HET_POP2         FST
1 scaffold_0 1257   0.3846154   0.4545455 0.06993007 0.4733728 0.4958678 0.004995005
2 scaffold_0 1302   0.4000000   0.4615385 0.06153846 0.4800000 0.4970414 0.003846154
3 scaffold_0 2072   0.3888889   0.5294118 0.14052288 0.4753086 0.4982699 0.019876591

the call:

my.ranged.dat<-RangedData(ranges=IRanges(start=dat$POS, end=dat$POS), space=dat$CHRM, score=dat$FST)

    RangedData with 6 rows and 1 value column across 8166 spaces
       space       ranges |       score
    <factor>    <IRanges> |   <numeric>
1 scaffold_0 [1257, 1257] | 0.004995005
2 scaffold_0 [1302, 1302] | 0.003846154
3 scaffold_0 [2072, 2072] | 0.019876591
4 scaffold_0 [3513, 3513] | 0.001382604
5 scaffold_0 [4392, 4392] | 0.000637690
6 scaffold_0 [4469, 4469] | 0.006060606
bioconductor r • 2.5k views
ADD COMMENT
0
Entering edit mode

Some questions back: Do you have a single score per position only? Or do you have a score for each base postion (being zero or NA for most positions)? In computing the average scores, how should missing scores be treated, should they be ignored, actually this is what I think makes most sense, ortherwise all averages will be ~0.

ADD REPLY
3
Entering edit mode
12.1 years ago
Michael 54k

Here is a quick solution. Works for a single scaffold only, if multiple scaffolds are contained in your example we need to loop over the spaces contained in rd (addressed as rd[1] in this example):

# define a function that does the binned mean + allowing for NA

windowApplyRle <-  function (object, width, fun, ...) {
   x <- trim(successiveViews(subject=object, width=rep(width,  ceiling(length(object)/width )  )))
   return (viewApply(x, fun, ...))
}

# make some sample data
rd <- RangedData(ranges=IRanges(start=c(15,55,1001,1500), width=1), score=c(1,10,1000,2000))
rd
RangedData with 4 rows and 1 value column across 1 space
     space       ranges |     score
  <factor>    <IRanges> | <numeric>
1        1 [  15,   15] |         1
2        1 [  55,   55] |        10
3        1 [1001, 1001] |      1000
4        1 [1500, 1500] |      2000

l1 = max(end(rd[1])) # or length of first contig  
rle <- Rle(NA, l1) # empty Rle object
rle[start(rd[1])] <- score(rd[1]) # fill with scores

windowApplyRle(rle, 1000, mean, na.rm=T)
[1]   5.5 1500.00000
ADD COMMENT

Login before adding your answer.

Traffic: 2289 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6