Split DNA sequence into dimers
2
1
Entering edit mode
5.1 years ago
erickfqqa ▴ 20

I'm trying to split a list of dna kmers into dimers, for split them in nucleotides I've been using this tidyverse function

kmer <- "ATTCCCGG" 
ntd_kmr <- str_split_fixed(kmer,"",8)

and the output is the next

A,T,T,C,C,C,G,G

I would like to split the kmer into dimer so the output looks like the next

AT,TT,TC,CC,CC,CG,GG

I know that seqinr package has a function that do it, but I don't know how to do with overlapping

sequence R • 2.5k views
ADD COMMENT
0
Entering edit mode

you can do it in command line:

$cat test.txt
atgccg

$ sed -e 's/../&, /g' test.txt | sed -r 's/,\s$//'
at, gc, cg
ADD REPLY
0
Entering edit mode

Almost... expected output for your example input should be at, tg, gc, cc, cg.

ADD REPLY
0
Entering edit mode

with awk, could get it:

$ cat test.txt
  ATTCCCGG

$ awk '{for (i=1; i< length($0); i++) print substr($0,i,2)}' test.txt
  AT,TT,TC,CC,CC,CG,GG
ADD REPLY
2
Entering edit mode
5.1 years ago
JC 13k

I think this is R, so you can simply do:

> kmer <- "ATTCCCGG" 
> x <- vector()
> for(n in seq(1, nchar(kmer) - 1, 1)) x <- c(x, substr(kmer, n, n+1) )
> x
[1] "AT" "TT" "TC" "CC" "CC" "CG" "GG"
ADD COMMENT
1
Entering edit mode

Avoid growing objects in a loop, use length argument for vector:

x <- vector(mode = "character", length = 7L)
# Or use character() which is just a wrapper for internal vector function
# x <- character(length = 7L)

for(n in seq(nchar(kmer) - 1)) x[ n ] <- substr(kmer, n, n + 1)
x
# [1] "AT" "TT" "TC" "CC" "CC" "CG" "GG"
ADD REPLY
0
Entering edit mode

yes, it is more efficient, but you need to know the string size to create an empty vector.

ADD REPLY
0
Entering edit mode

We know it already nchar(kmer) - 1).

ADD REPLY
2
Entering edit mode
5.1 years ago
zx8754 11k

We can use substring which takes a vector of start and end positions.

substring(kmer, first = 1:(nchar(kmer) - 1), last = 2:nchar(kmer))
# [1] "AT" "TT" "TC" "CC" "CC" "CG" "GG"
ADD COMMENT

Login before adding your answer.

Traffic: 1516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6