Generate random genomic positions with R
1
1
Entering edit mode
7.4 years ago

Hi,

Is there a package/function to generate random genomic position in R ? If not it's not complicated to write something using sample() but I do not want to re-re-re-reinvent the wheel ..

Edit : I wrote that. Some benchmarking with other given solutions in this thread seems to be required !

generateRandomPos <- function(n,chr,chr.sizes,width=1){
    random_chr <- sample(x=chr,size=n,prob=chr.sizes,replace=T)
    random_pos <- sapply(random_chr,function(chrTmp){sample(chr.sizes[chr==chrTmp],1)}) 
    res <- GRanges(random_chr,IRanges(random_pos,random_pos+width))
    return(res)
}

Thanks

R genomic position • 5.1k views
ADD COMMENT
0
Entering edit mode

(Davetang is the author from gist.github.com/davetang/6548010):

ADD REPLY
1
Entering edit mode

In fact it's not really random as my_random_chr[i] <- sample(x=my_chr,size=1) will pick a chromosome unregarding of its size. Anyway thank you I'll will rewrite the code for my usage and post it here ;)

ADD REPLY
0
Entering edit mode

Nice but I don't like the for loop. It would be easy and much more elegant to convert it to a function and apply.

ADD REPLY
0
Entering edit mode

I wrote that : chr= chromosome names and chr.sizes = their associated size

generateRandomPos <- function(n,chr,chr.sizes,width=1){
    random_chr <- sample(x=chr,size=n,prob=chr.sizes,replace=T)
    random_pos <- sapply(random_chr,function(chrTmp){sample(chr.sizes[chr==chrTmp],1)}) 
    res <- GRanges(random_chr,IRanges(random_pos,random_pos+width))
    return(res)
}
ADD REPLY
0
Entering edit mode

You need a random position .may be with sample(1:length sequence,1,true) will gives one sample

ADD REPLY
0
Entering edit mode

what about bedtools [shuffle][1]. You can take a look at this link as well. Ideally they allow to pick up regions from the original master file. I guess you can also use this code to generate one.

ADD REPLY
1
Entering edit mode
7.4 years ago

I don't think there is a function for it already, but I suggest you to get chromosome lengths from the TxDb.Hsapiens.UCSC.hg19.knownGene object from Homo.sapiens.

> library(Homo.sapiens)
> seqlengths(TxDb.Hsapiens.UCSC.hg19.knownGene)[1:24]
     chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022
     chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
141213431 135534747 135006516 133851895 115169878 107349540 102531392  90354753
    chr17     chr18     chr19     chr20     chr21     chr22      chrX      chrY
 81195210  78077248  59128983  63025520  48129895  51304566 155270560  59373566

> rr  = data.frame(chr=sample(seqlevels(TxDb.Hsapiens.UCSC.hg19.knownGene)[1:24], 40, replace=F))
> rr$start = apply(rr, 1, function(x) { 
         round(runif(1, 0, seqlengths(TxDb.Hsapiens.UCSC.hg19.knownGene)[x][[1]]), 0) }
         )
> rr$end = rr$start + runif(1, 1, 1000)  # random intervals of 1-1000 bases
> rr$strand = sample(c("+", "-"))

> rr.gr = makeGRangesFromDataFrame(rr)
> rr.gr
GRanges object with 40 ranges and 0 metadata columns:
       seqnames                 ranges strand
          <Rle>              <IRanges>  <Rle>
   [1]     chr1 [225055397, 225056035]      +
   [2]    chr12 [125243701, 125244339]      -
   [3]    chr22 [ 42744305,  42744943]      +
   [4]     chr1 [ 79241981,  79242619]      -
   [5]     chr7 [151649796, 151650434]      +
   ...      ...                    ...    ...
  [36]    chr21   [33907192, 33907830]      -
  [37]    chr18   [73931535, 73932173]      +
  [38]     chrY   [14300243, 14300881]      -
  [39]    chr17   [26049438, 26050076]      +
  [40]    chr22   [26631730, 26632368]      -
  -------
ADD COMMENT
0
Entering edit mode

Is there a way to make random genomic ranges, but avoiding overlap between the segments? I want to randomly place non-overlapping segments (with a given length[width]) across the genome. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6