dealing with insertion in a data set
2
0
Entering edit mode
2.5 years ago
Lila M ★ 1.2k

Hello, I don't know if this is the best place to ask this question, but I will try. I have a data set with single nucleotide mutations, but I have some rows that contains insertion, something like this:

chr1  T  C     A
chr1  T  C+G   A 
chr1  T  C+T   T

I would like to store in one variable the mutations without insertions and in other the mutations with them. What I've done so far is:

xx <- x %>% filter(across(everything(), ~ !str_detect( "+")))

but it did not worked, Any clues? Thank you!

insertion grep dyplr • 926 views
ADD COMMENT
1
Entering edit mode
2.5 years ago
Lila M ★ 1.2k

Thank you for the feedback, What worked for me was

insertions <- x %>% filter_all(any_vars(str_detect(.,  "\\+")))
ADD COMMENT
1
Entering edit mode
2.5 years ago

In base R:

# creating example dataframe
chr = (rep("Chr1",3))
refAllele = (rep("T",3))
altAllele = c("C", "C+G", "C+T")
otherAllele = c("A", "A", "T")
df = data.frame(chr = chr, refAllele=  refAllele, altAllele = altAllele, otherAllele= otherAllele)
# subsetting dataframe based on the length of the altAllele: Insertion variants have nchar more than 1 
noInsertionVar = df[nchar(df$altAllele) == 1,]
noInsertionVar
#chr refAllele altAllele otherAllele
#1 Chr1         T         C           A
insertionVar = df[nchar(df$altAllele) != 1,]
insertionVar
#chr refAllele altAllele otherAllele
#2 Chr1         T       C+G           A
#3 Chr1         T       C+T           T

Updating with tidyverse functions per the request in below comment:

insertionVar <- df %>% dplyr::filter(across(everything(), ~ str_detect(altAllele, "\\+")))
insertionVar
#chr refAllele altAllele otherAllele
#1 Chr1         T       C+G           A
#2 Chr1         T       C+T           T

noInsertionVar <- df %>% dplyr::filter(across(everything(), ~ !str_detect(altAllele, "\\+")))
#chr refAllele altAllele otherAllele
# 1 Chr1         T         C           A
ADD COMMENT
0
Entering edit mode

I would like a dplyr approach, if possible. What worked for me is

non_insertion_var  <- df %>% filter(across(everything(), ~ !str_detect(., "\\+")))

However, for insertion doesn't work

insertion_var  <- df %>% filter(across(everything(), ~ str_detect(., "\\+")))

Any ideas what I'm doing wrong? Thanks!

ADD REPLY
1
Entering edit mode

I have updated the my answer. if you specifiy which column to look for the "+", your code work well.

ADD REPLY

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6