Counting the frequency of genotypes per row based on the calls of the first column in a data frame in R
1
1
Entering edit mode
5.4 years ago
Famf ▴ 30

I have a genotype data frame in R similar to this

ID  P1  P2  in1 in2 in3 in4
M01 CC  GG  CC  GG  CC  GG
M02 TT  CC  TT  TT  CC  TT
M03 AA  GG  AA  GG  GG  GG
M04 CC  GG  CC  GG  CC  GG
M05 GG  AA  AA  GG  AA  AA
M06 CC  GG  CC  GG  CC  CC

I want to add a column with the frequencies of all the genotypes in the column P1. I want to count starting from the column in1 onward per each row. Like the table below:

ID  P1  P2  in1 in2 in3 in4 frqP1
M01 CC  GG  CC  GG  CC  GG  2
M02 TT  CC  TT  TT  CC  TT  3
M03 AA  GG  AA  GG  GG  GG  1
M04 CC  GG  CC  GG  CC  GG  2
M05 GG  AA  AA  GG  AA  AA  1
M06 CC  GG  CC  GG  CC  CC  3

I was trying with following code but it doesn't work

df$frqP1 <- rowSums(df[-1] == df$P1)

Any idea?

R genotype • 1.7k views
ADD COMMENT
0
Entering edit mode

it doesn't work

Does it throw an error (then add the error/warning message), does it give wrong output?

ADD REPLY
2
Entering edit mode
5.4 years ago
ATpoint 81k
df$frqP1 <- rowSums(df[-c(1:3)] == as.character(df$P1))

You were almost right. Just convert the query (df$P1) from factor level to character, and make sure that you really only keep the in-columns in the subject, so remove columns 1 to 3.

ADD COMMENT
0
Entering edit mode

Effectively, that works!. But I realized it returns a NA instead of a value in the column frqP1 for those rows that have at least one missing data (NA). Is there any way to avoid that?

ADD REPLY
0
Entering edit mode

Use na.rm=TRUE to ignore NAs. Read the manuals.

ADD REPLY

Login before adding your answer.

Traffic: 2670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6