Question

Common genomic intervals in R

0

Entering edit mode

10.0 years ago

viniciushs88 ▴ 50

I would like to infer shared genomic interval between different samples.

My input:

sample    chr start end
NE001      1   100  200
NE001      2   100  200
NE002      1   50   150
NE002      2   50   150
NE003      2   250  300

My expected output:

chr start end  freq
1    100  150   2
2    100  150   2

Where the "freq" is the how many samples have contribuited to infer the shared region. In the above example freq = 2 (NE001 and NE002).

Cheers!

R genome overlap • 3.8k views

ADD COMMENT • link 10.0 years ago by viniciushs88 ▴ 50

0

Entering edit mode

What have you tried so far?

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

2

Entering edit mode

Actually, I know how to measure the shared numbers between two regions.

input:

start1  end1    start2  end2    bp_overlapped   
   20    30       25      35          6        
   25    35       20      30          6
  100    190     126     226          65
  126    226     100     190          65

Function:

f <- function(x) length( intersect(seq(x[1],x[2],1), seq(x[3],x[4],1)) )

a <- apply(X,1,f)

And I known how to overlap regions and a "cbind" with overlapped ones too. But when all regions are in a same dataframe, and have a collumn tagging different animals... sincerely I don´t know how can I start the logic...

ADD REPLY • link 10.0 years ago by viniciushs88 ▴ 50

0

Entering edit mode

Cool, we try to ensure that posters have given it a go before providing help. Hopefully my answer (below) will work well for you.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-04-15

This is a bit easier to do if you use the GenomicRanges package from bioconductor:

library(GenomicRanges)
gr <- GRanges(seqname=c(1,2,1,2,2), range=IRanges(start=c(100,100,50,50,250),
    end=c(200,200,150,150,300))) #N.B., You wouldn't do this manually
bins <- disjoin(gr)
mcols(bins)$count <- countOverlaps(bins, gr)

The disjoin() command splits the ranges into no-overlapping segments.

Edit: I left off a ) in an earlier version. Mea culpa!

Ram · Answer 2 · 2014-04-15

1

Entering edit mode

10.0 years ago

Alex Reynolds 35k

If you can use system() within R, or if you don't need to use R, here's a way to do it with BEDOPS (i.e., you'd wrap these calls in system() or use the command-line):

$ sort-bed inputs.unsorted.bed > inputs.bed
$ bedops --intersect inputs.bed | bedmap --echo --count - inputs.bed > answer.bed

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Alex Reynolds 35k