merge two csv files in r (gene to protein pathways)
3
0
Entering edit mode
9.4 years ago
pnixsweet • 0

How can I read in two tab delimited files and map them together by one common column(protein)?

protein_pathway.txt

Pathway                                                        Protein
Binding and Uptake of Ligands by Scavenger Receptors           P69905
Erythrocytes take up carbon dioxide and release oxygen         P69905
Metabolism                                                     P69905
Amyloids                                                       P02647
Metabolism                                                     P02647
Hemostasis                                                     P68871

protein_gene.txt

Gene      Protein
Fabp3     P11404
HBA1      P69905
APOA1     P02647
Hbb-b1    P02088
HBB       P68871
Hba       P01942
datafile1 <- read.csv("c:/gene.csv", header=T, sep=",")
datafile2 <- read.csv("c:/pathway.csv", header=T, sep=",")

dim(datafile1)
dim(datafile2)

datafile <- rbind(datafile1,datafile2)
dim(datafile)

write.csv(datafile,"c:/datafile.csv")

This only gives me the merged (appended one). How can map by a common column protein here?

R • 4.8k views
ADD COMMENT
2
Entering edit mode
9.4 years ago

help(match)

ADD COMMENT
2
Entering edit mode

and help(merge) if you want to to it like a database join

ADD REPLY
2
Entering edit mode

And when merge() gets slow due to absolutely huge datasets:

library(dplyr)
help(left_join)
ADD REPLY
1
Entering edit mode
7.3 years ago
tarmowow ▴ 10

Merge them by using http://merge-csv.com. You can remove dupcliate headers also.

ADD COMMENT
0
Entering edit mode
9.4 years ago
zx8754 11k

Use merge:

datafile <-  merge(datafile1, datafile2)

http://www.statmethods.net/management/merging.html

ADD COMMENT

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6