Split fasta string into variables by using R
1
0
Entering edit mode
6.1 years ago
kristin • 0

Hi All,

How do you split fasta string into variables by using R?

For an example:

Input fasta string:

">VFG000871(gb|NP_757239) (fimB) Type 1 fimbriae Regulatory protein fimB [Type 1 fimbriae (VF0221)] [Escherichia coli CFT073]"

Goal to get this:

[1] ">VFG000871"   "(gb|N_757239)"   "(fimB)"   "[Type 1 fimbriae (VF0221)]"   "[Escherichia coli CFT073]"

I'm struggling to split the string by using sub() because of '(' ... It would show the error that ')' is missing. I have only to make ">VFG000871" as variable. Do you have any better suggestion for me?

R fasta • 1.0k views
ADD COMMENT
2
Entering edit mode
6.1 years ago
ATpoint 81k

The problem is that you have no common delimiter, therefore one has to address every delimiter itself:

Split_String <- function(my.string){
  first  <- strsplit(my.string, split="\\(")[[1]][1]
  second <- paste("(", strsplit(my.string, split="\\(")[[1]][2], sep="")
  third  <- strsplit(my.string, split=" ")[[1]][2]
  fourth <- paste("[", trimws(strsplit(my.string, split = "\\[")[[1]][2], "right"), sep="")
  fifth  <- paste("[", strsplit(my.string, split="\\[")[[1]][3], sep="")

  return(c(first, second, third, fourth, fifth))
}

Given that you have multiple strings like this, you can use the function to loop over a data frame or whatever container you have.

ADD COMMENT
0
Entering edit mode

Thank you for your input

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6