Hi: I ran into bug when merging different peak overlap hit table, couldn't locate the issue efficiently. However, I used three bed files for finding overlap, so I implement function that return different index table. The point is, pattern of each index table is different from one to another. I come up solution to solve this issues, certainly it works for further meta process that I am gonna proceed, but when validating the output of my implementation with existing software tools, and final output for first bed files is perfectly matched while second, third bed files' result remain some difference (where each GRanges object will yield multiple output through several filtering process, and three bed files must be processed by parallel). How can I efficiently merge hit index table as I expected ? Any way to debug the issue?
Here is reproducible example:
idxTb_1 <- list(
foo = IntegerList(1,2,3,4,5,6,7,8,9,10),
bar = IntegerList(1,2,3,4,5,6,integer(0),7,integer(0),8),
cat = IntegerList(1,2,3,4,5,6,integer(0),7,8,10)
)
idxTb_2 <- list(
bar = IntegerList(1,2,3,4,5,6,7,8,9,10,11),
foo = IntegerList(1,2,3,4,5,6,8,10,11,integer(0),13),
cat = IntegerList(1,2,3,4,5,6,7,10,13,14, integer(0))
)
idxTb_3 <- list(
cat = IntegerList(1,2,3,4,5,6,7,8,9,10),
foo = IntegerList(1,2,3,4,5,6,8,9,integer(0),10),
bar = IntegerList(1,2,3,4,5,6,7,integer(0),integer(0),8)
)
So I come up with this solution:
t1 <- as.matrix(DataFrame(idxTb_1))
t2 <- as.matrix(DataFrame(idxTb_2[names(idxTb_1)]))
t3 <- as.matrix(DataFrame(idxTb_3[names(idxTb_1)]))
idxTb_final <- unique(DataFrame(rbind(t1, t2, t3)))
My approach is not desired, but this at least gives me perfect match result for only first bed files, so I need to figure out the where is issue come from. I suspect issue might come from merging index table. Can anyone give me idea how to address this issues ? what's the efficient way for merging index hit table? Any idea ?
Dear Alex:
Thanks for your prompt hit. Well,
BEDOPS
has done good job for detecting overlap regions. But, here is the things, if one big genomic interval in first bed file overlap with several small interval in second bed, so in my specification, I only keep one overlapped interval if multiple intersection happens, while I need roll around this process where small interval against big interval respectively, and I have done this all in R. To ease the computation easier, I intend to merge index table, but it did not yield perfect job for me. Would it possible some script ofBEDOPS
call in R ? Could you give me some idea to possibly solve my issues? Thanks :)You can call
system()
inR
to run commands. See: https://stat.ethz.ch/R-manual/R-devel/library/base/html/system.htmlBut perhaps I am not understanding what it is you are trying to do. Can you explain in a different way what you are trying to do?
For example, let us say you have three BED files that define intervals for peaks. Are you trying to get IDs of those peaks, where peaks from all three files overlap? Or are you trying to do something else?
Perhaps sketch or draw out what you are trying to do with some example peaks.
Dear Alex:
I sketch up the issue on the drafts here, which might explain the background problem easier. While here is the function I implemented and it works well, even I got correct result when validating with set of bed files. I set up several filtering to validate and save weakly enriched peaks through the evidence of overlaped peaks.
apparently, I need to have better idea to improve the code. Could you give me quick solution based on the above scripts and sketched issue I presented ? Thank you very much :)