Entering edit mode

Hello,

I have a set of elements with the following distribution of lengths:

```
summary(width(positivelincrnas))
Min. 1st Qu. Median Mean 3rd Qu. Max.
470 4164 9872 18940 20790 152600
```

and another dataset with the following distribution:

```
summary(width(positivegeneshg19))
Min. 1st Qu. Median Mean 3rd Qu. Max.
20 5558 20460 59880 58360 4829000
```

I would like to get elements from the second dataset (genes) such that they are of the same length distribution as the first set of elements (lincrnas). Both objects are GRanges objects.

Any suggestions?

Thanks a lot,

Dimitris

Entering edit mode

In order to match the length distributions, you can compute a density estimate from the first data set and sample from the second data set considering that density. Let's assume we have two GRanges object: gr1 (positivelincrnas) and gr2 (positivegeneshg19). The trick here is to use a weighted sampling scheme where the probability is derived from the distribution of the first dataset.

```
bins = seq(1000, 25000, by = 1000) ## choose according to your dataset
h = hist(width(gr1), bins, plot = FALSE)
idx = cut(width(gr2), bins, labels = FALSE)
gr2matched = sample(gr, final_size, prob = h$density[idx]) ## adjust the 'size' and 'replace' arguments
```

Loading Similar Posts

See also https://support.bioconductor.org/p/74583/

See also https://support.bioconductor.org/p/74583/