Biostar Beta. Not for public use.
How to match location of string
0
Entering edit mode
11 months ago

Hi,

I am trying to match the location of the list of string like this

library(seqinr)

at <- ("ATATATAT")
s1 <-ifelse(at[8]=="T"||"A" && at[7]=="A"||"T" &&
            at[6]=="T"||"A",5,
            ifelse(at[2]=="T"||"A" && at[4]=="A"||"T" &&
                     at[1]=="T"||"A",'1','0'
            ))
s1

It works fine only for one sequence. I tried it in a for loop but getting error like

invalid 'x' type in 'x && y'

Any help is much appreciated Thanks

R seqinr • 238 views
ADD COMMENTlink
0
Entering edit mode

This is a Question, not a Page, please be careful when selecting the post type.

What is s2c()? From which package?

If the code above works but the loop doesn't, you should show the loop as well, and provide an example dataset to replicate the failure.

ADD REPLYlink
1
Entering edit mode

This looks like code directly translated from Excel functions. Surely there must be better, more efficient ways to achieve OP's goals.

ADD REPLYlink
1
Entering edit mode

I think this is the s2c function OP is using. Also, how is a[8] == "T"||"A" even proper R syntax? the "T" || "A" will throw an error. Pretty sure OP's code doesn't work as-is at the moment.

ADD REPLYlink
0
Entering edit mode

Can you describe what you're trying to achieve and what a actually looks like (i.e., the result of s2c(at)).

EDIT: and what the final for-loop is supposed to achieve.

I promise, if you describe your question properly (i.e. what exactly should be the end result?) there's going to be a more robust way of doing that in R.

ADD REPLYlink
0
Entering edit mode

that "a" I was using for next coding step; not the part of this analysis.

ADD REPLYlink
0
Entering edit mode

Thanks, everyone for reply.

Let me correct my question to make it easy to understand

I have a list of sequences like this in 2nd column of a csv file.

        Seq
>1_seq     ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACAG
>2_seq     ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACCC
>3_seql    ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACTT
>4_seql    ACGTATTGATGCCACAGACGTATTGATGCCACAGACGTATTGATGCCACAG

I want to match the position of each sequences w.r.t. each other For example, if A or T is present in "11th or 17" location of each sequence then return 1 else 0.

Thanks in advance

ADD REPLYlink
1
Entering edit mode

That seems to be partially from multiple sequence alignment, Either way, you might benefit from creating a 2D matrix with each column a base position and each row a sequence, that would be a lot easier to filter using indexes.

ADD REPLYlink
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

Ideally edit your original question and add this information there.

ADD REPLYlink
0
Entering edit mode

For example, if A or T is present in "11th or 17" location of each sequence then return 1 else 0.

That doesn't make sense to me.

ADD REPLYlink
0
Entering edit mode

Do you only care about the presence or absence in certain positions? How many positions are you interested in?

ADD REPLYlink
3
Entering edit mode
3 months ago
zx8754 7.5k
London

To simplify your example, condition: if any 2nd or 4th position in every sequence has A or T, then TRUE.

# example data
x <- c("AAGTA", 
       "AAGTA", 
       "AAGTA", 
       "ACGAA")

# in this example all TRUE
all(substr(x, 2, 2) %in% c("A", "T") | substr(x, 4, 4) %in% c("A", "T"))
# [1] TRUE

If this is not the solution you are looking for, then please provide example input and expected output, clearly.

ADD COMMENTlink
0
Entering edit mode

excellent.....

Thanks alot dear. It's working...

ADD REPLYlink
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink
2
Entering edit mode
12 months ago
United States

To address your error message:

assuming this is R code, I don't think that the command works even in a single instance outside a for-loop:

> "A" == "T"||"A" && "A" == "A"||"T"
Error in "A" && "A" == "A" : invalid 'x' type in 'x && y'

The syntax would have to be:

> "A" %in% c("T","A") && "A" %in% c("T","A")
[1] TRUE

That being said, as the numerous comments above indicate, there's most definitely a more straight-forward way of doing whatever it is you're trying to do.

ADD COMMENTlink
0
Entering edit mode

The following regex would test the same things:

ifelse(grepl(".{5}[A|T]{3}", at), 
          5, 
             ifelse(grepl("[A|T]{2}.[A|T]", at), 
                      1,
                      NA
))

Note how you're also missing the indication for what should happen if the second ifelse iteration returns a FALSE (I've used NA here)

ADD REPLYlink
0
Entering edit mode

Thanks for the quick reply. But still, I am getting the same error for big files.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1