Question

Found Correspondent Numbers In Integer Intervals (R)

0

Entering edit mode

10.1 years ago

viniciushs88 ▴ 50

I would like to get the correspondent numbers between two integers intervals. My input is like that:

start1  end1    start2  end2    
  20     30      25      35
  25     35      20      30    
 100     190    126      226      
 126     226    100      190

In the first and second line, the overlap from first(1) interval (2 first columns) to second(2) interval (2 last columns) was equal to 6 correspondents numbers (25,26,27,28,29 and 30).

My expected output is like that:

 start1  end1    start2  end2    bp_overlapped   
   20    30       25      35          6        
   25    35       20      30          6
  100    190     126     226          65
  126    226     100     190          65

It is a matrix in R.

Thank you

r overlap • 2.2k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 10.1 years ago by viniciushs88 ▴ 50

1

Entering edit mode

Please indicate relevance of question to a specific bioinformatics research problem.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 10.1 years ago by Neilfws 49k

Ram · Answer 1 · 2014-03-06

This has only the most tenuous connection to bioinformatics if I make a number of assumptions about why you're trying to do this. You should really post this on an R forum. Having said that:

m <- matrix(c(20,25,100,126,30,35,190,226,25,20,126,100,35,30,226,190), ncol=4)
overlap <- apply(m, 1, function(x) length(intersect(x[1]:x[2], x[3]:x[4])))
cbind(m, overlap)

Ram · Answer 2 · 2014-03-06

This should work:

# dummy data
df <- read.table(text="start1  end1    start2  end2    
20     30      25      35
25     35      20      30    
100     190    126      226      
126     226    100      190",header=TRUE)

# Count overlap
df$bp_overlapped <- 
  sapply(1:nrow(df), function(x)
  {
    length(
      intersect(c(df[x,1]:df[x,2]),
                c(df[x,3]:df[x,4])))
  })

Ram · Answer 3 · 2015-02-09

You can use findOverlaps command in R. The script is as follows:

data2=read.table("C:/file_name.txt",sep = "\t",fill = TRUE)
data2=data2[data2[,1]=="Chromosome_name",]
end=0
start=data2[,2]

for(i in 1:length(data2[,1]))
{
  x=length(data2[i,])-sum(is.na(data2[i,]))
  end[i]=data2[i,x]
}
chr=data2[,1]
genes=data.frame( chr,start,end)

library(IRanges)
query <- IRanges(start,end)

result=read.table("C:/GC/chromosome_name.txt/result.txt")

subject <- IRanges(c(result$start1), c(result$end1))
tree <- IntervalTree(subject)
findOverlaps(query, tree, select = "all")