Code for looking for overlaps
3
0
Entering edit mode
9.0 years ago
rrsowmya ▴ 20

I have two files A and B. I want to look for rows that overlap between these two files and retrieve only those rows from file A into a separate file altogether. Could anyone help me with a bunch of codes in R to perform this.

I cannot manipulate my data in excel since it is too large and I'm still new to R.

Thank you in anticipation for your help

R • 3.0k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

What do you mean by "overlap"? Could you post a couple of example rows from file A and file B and show how what you want to have happen?

ADD REPLY
0
Entering edit mode

Okay here's the question again

file A:

A    gene1       33
B    gene2       34
C    gene3       89
D    gene1       09
E    gene3       33
F    gene1       86

File B

A
C
F
T
P
G

I would like A,C and F (as they overlap between the two files) into a separate file.

New file:

A    gene1       33
C    gene3       89
F    gene1       86

Hope this makes better sense.

ADD REPLY
1
Entering edit mode

Post this as comment to you original question and see answer by Ido Tamir: A: Code for looking for overlaps.

ADD REPLY
1
Entering edit mode

Now its nice.

merge function will do it for you now.

result <- merge(x = file1, y = file2, by.x = "colname", by.y = "colname")

Here, colnames is the columns name, on the basis of which, tables are to be merged.

For more help: https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html

ADD REPLY
0
Entering edit mode

Using %in% may be faster than merge.

file1[file1[,1] %in% file2[,1],]
ADD REPLY
0
Entering edit mode

If you're really doing it based on the letters and they are unique, you could use a command line:

grep -f fileB fileA
ADD REPLY
0
Entering edit mode

Just be aware that this will not work in most (more complex) cases. For example, if 'A' is in the file, you will also get genes 'AA' 'AB' 'CAG' and anything containing A. In your 'fileB', if you can add more regex info, it can help. For example:

A
B
C

Should be:

^A\t
^B\t
^C\t

This will return only gene 'A' since it specifies that the letter 'A' must happen RIGHT after the beginning of the line (^) and must immediately be followed by a tabulation (\t).

ADD REPLY
2
Entering edit mode
9.0 years ago
Deepak Tanwar ★ 4.2k

Actually, merge, intersect functions are for columns, as far as I know.

Suppose: You have 2 objects obj1 and obj2:

If you only want to separate the rows in common, you could use the following function:

rows_in_common <- function(x,y)
{
  a <- apply(x, 1, paste, collapse = "")
  b <- apply(y, 1, paste, collapse = "")
  c <- x[!a %in% b,]
  return(c)
}

You can obtain result then by:

result <- rows_in_common(x = obj1, y = obj2)

Another thing is, this question is not related to Bioinformatics. This is a programming related question. Please post programming related questions from next time to: http://stackoverflow.com/

ADD COMMENT
1
Entering edit mode
9.0 years ago
PoGibas 5.1k

If you are planning to analyze genomic ranges in R (great choice!) - GenomicRanges from Bioconductor is all what you need. And this introduction by Dave Tang is a very good start.

# At first you'll need to install it.
source("[http://bioconductor.org/biocLite.R](http://bioconductor.org/biocLite.R)")
biocLite("GenomicRanges")
library("GenomicRanges")

# Read in your data
data <- read.table("test.bed",header=F)
# Format it

# Read in and format second file

# And intersect
intersect(bed, bed2)

There is more documentation here.

ADD COMMENT
0
Entering edit mode
9.0 years ago
Ido Tamir 5.2k

This is called a join and the function is called merge in R

merge two files

I would suggest you start by a) learning to define your questions more precisely b) look at e.g. http://swirlstats.com/ or other online resources for R.

ADD COMMENT
0
Entering edit mode

Ohhh, now I get it ... This was about merge and not intersect :-|

ADD REPLY
0
Entering edit mode

maybe it was, I don't know. Thats why I wrote xhe should be more precise in questioning.

ADD REPLY

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6