How to alternate and merge columns from different data frames?
0
0
Entering edit mode
5.9 years ago
Spacebio ▴ 200

Hello,

I have two different dfs looking like the example below. df1 displays the name of a group of pathways as df2 shows the category of the pathway in the same order they appear on df1

> df1:

Path_1                                           Path_2
Amphoterin signaling                             Antigen presentation
Antigen presentation                             Death Domain receptors & caspases in apoptosis
Regulation of angiogenesis                       Apoptosis stimulation by external signals
Blood vessel morphogenesis                       Regulation of angiogenesis
Cartilage development                            Blood vessel morphogenesis
Apoptosis stimulation by external signals        Cartilage development
Death Domain receptors & caspases in apoptosis   Amphoterin signaling


> df2:

Type_1                     Type_2
Inflammation               Immune response
Immune response            Signal transduction
Development                Apoptosis and survival
Development                Development
Development                Development
Apoptosis and survival     Development
Signal transduction        Inflammation

I'd like to obtain a unique df displaying both columns like this:

> df_all:

df_all_1
Amphoterin signaling_Inflammation
Antigen presentation_Immune response
Regulation of angiogenesis_Development
Blood vessel morphogenesis_Development
Cartilage development_Development
Apoptosis stimulation by external signals_Apoptosis and survival
Death Domain receptors & caspases in apoptosis_Signal transduction

df_all_2
Antigen presentation_Immune response
Death Domain receptors & caspases in apoptosis_Signal transduction
Apoptosis stimulation by external signals_Apoptosis and survival
Regulation of angiogenesis_Development
Blood vessel morphogenesis_Development
Cartilage development_Development
Amphoterin signaling_Inflammation

I tried with this code:

df_all <- merge(data.frame(df1, row.names = NULL), data.frame(df2, row.names = NULL), by = 0, all = T)[-1]

but this is just merging all the columns together without alternating. Any suggestions? Preferably base R

R dataframe • 4.1k views
ADD COMMENT
2
Entering edit mode

Output will be stored in a third dataframe (df3) and each column from two data frames will be concatenated. It is a blind concatenation assuming that column 1 of df1 has exact rows as column 1 of df2 and they match. Number of columns and number of rows of each data frame (df1, df2) match with resultant data frame (df3)

setwd("~/Desktop/")
df1=read.csv("df1.txt",sep="\t", strip.white = T, stringsAsFactors = F)
df2=read.csv("df2.txt",sep="\t", strip.white = T, stringsAsFactors = F)

df3 = data.frame(matrix(NA, ncol = ncol(df1), nrow = nrow(df1)))

for (i in 1:ncol(df1)){
#    print (i)
    df3[,i]=paste(df1[,i],df2[,i],sep="_")
}

or

df3=data.frame(sapply(seq(1:ncol(df1)), function(x) paste(df1[,x],df2[,x],sep="_")))

output:

"X1" "X2"
"Amphoterin signaling_Inflammation" "Antigen presentation_Immune response"
"Antigen presentation_Immune response" "Death Domain receptors & caspases in apoptosis_Signal transduction"
"Regulation of angiogenesis_Development" "Apoptosis stimulation by external signals_Apoptosis and survival"
"Blood vessel morphogenesis_Development" "Regulation of angiogenesis_Development"
"Cartilage development_Development" "Blood vessel morphogenesis_Development"
"Apoptosis stimulation by external signals_Apoptosis and survival" "Cartilage development_Development"
"Death Domain receptors & caspases in apoptosis_Signal transduction" "Amphoterin signaling_Inflammation"
ADD REPLY
0
Entering edit mode

The loop works really fast, thank you so much!!

ADD REPLY
1
Entering edit mode

To get column names as df_all_1, df_all_2, use following code:

for (i in 1:ncol(df1)){
    #    print (i)
    df3[,i]=paste(df1[,i],df2[,i],sep="_")
    colnames(df3)[i]=paste0("df_all_",i)
}

> df3
                                                            df_all_1
1                                  Amphoterin signaling_Inflammation
2                               Antigen presentation_Immune response
3                             Regulation of angiogenesis_Development
4                             Blood vessel morphogenesis_Development
5                                  Cartilage development_Development
6   Apoptosis stimulation by external signals_Apoptosis and survival
7 Death Domain receptors & caspases in apoptosis_Signal transduction
                                                            df_all_2
1                               Antigen presentation_Immune response
2 Death Domain receptors & caspases in apoptosis_Signal transduction
3   Apoptosis stimulation by external signals_Apoptosis and survival
4                             Regulation of angiogenesis_Development
5                             Blood vessel morphogenesis_Development
6                                  Cartilage development_Development
7                                  Amphoterin signaling_Inflammation
>
ADD REPLY
0
Entering edit mode
df_all = data.frame(df_all1 =  paste(df1$Path_1 , df2$Type_1 , sep="_") , df_all2 = paste(df1$Path_2 , df2$Type_2 ,sep= "_"))
ADD REPLY

Login before adding your answer.

Traffic: 1715 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6