Biostar Beta. Not for public use.
Comparing 2 Columns at once
1
Entering edit mode
2.4 years ago
mail2steff • 50
Potsdam, Germay

I am new to R programming. I have a data frame with 120 columns and 518 rows. Now I have to compare columns to columns (2 at once). If two values in successive columns are same 0 ( if not same -> 1) should be added to a new data frame

>data
V1 V2 V3 V4 V5 V6
A  A  C  C  G  G
A  G  T  T  C  G
G  C  T  A  A  C

The output should look like

>new_data_fram
V12 V34 V45
0   0   0
1   0   1
1   1   1

Can anyone help me with this? Thank you in advance

R seq • 444 views
ADD COMMENTlink
1
Entering edit mode

You're skipping a cpl of cols in your output example. Did you try any code in R? If so, show it along with any errors. If not, try something and come back with it.

ADD REPLYlink
0
Entering edit mode

I tried with combn fucntion in R.
compare = t(combn(ncol(file8),2,FUN=function(x)file8[,x[1]]==file8[,x[2]])) But I got the following output

V1  V2  V3  V4  V5  V6`

1 1 1 1 1 1

0 0 0 0 0 0

0 0 0 0 0 0

ADD REPLYlink
1
Entering edit mode
9 months ago
zx8754 7.5k
London

Taking advantage of recycling in R, we can do as below:

# data
df1 <- read.table(text = "V1 V2 V3 V4 V5 V6
A  A  C  C  G  G
A  G  T  T  C  G
G  C  T  A  A  C", header = TRUE, stringsAsFactors = FALSE)

# compare odd columns with even using recycling, then convert to number 0,1.
(!df1[, c(TRUE, FALSE)] == df1[, c(FALSE, TRUE)]) * 1
#      V1 V3 V5
# [1,]  0  0  0
# [2,]  1  0  1
# [3,]  1  1  1
ADD COMMENTlink
1
Entering edit mode

thank u so much . It worked perfectly

ADD REPLYlink
0
Entering edit mode
20 months ago
shoujun.gu • 370
Rockville/MD

here is the python code, replace the real file name in the first two lines:

input_file='your_input_file'
output_file='your_output_file'

import pandas as pd

df=pd.read_csv(input_file, index_col=0)
col=df.columns
col_t=col[:-1]

new_col=[col_t[i]+str(i+2) for i in range(len(col_t))]

for i in range(len(col_t)):
    df[new_col[i]]=(df[col[i]]==df[col[i+1]]).astype(int)

df=df.loc[:,new_col]
df.to_csv('output_file')
ADD COMMENTlink
0
Entering edit mode

Thank you for the reply. Ill try this also

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1