Question

Expand a DNA sequence with IUPAC codes into multiple sequence in R

0

Entering edit mode

6.9 years ago

english.server ▴ 290

Hi!

I could not come with a code that can expand a sequence like seq= 'ARCS'. What would be the algorithm or (pseudo)code for the expansion in R?

Thanks in advance.

p.s.

Many useful python codes have been generously supplied, but I dont know python!

R sequence • 3.4k views

ADD COMMENT • link 6.9 years ago by english.server ▴ 290

1

Entering edit mode

6.9 years ago

jmzeng1314 ▴ 140

what a coincidence!

I've the code for this question,also there's a brief description for it if you can understand our Chinese.

http://www.biotrainee.com/thread-668-1-1.html

while(<DATA>){
chomp;
@F=split/:/;
$hash{$F[0]}=uc $F[1];
} ##这里记录简并碱基的对应关系
## %hash stored the tables;
sub primer2multiple{
$primer=$_[0];
$prod=1;
$primer_len=length $primer ;
foreach $i (0..$primer_len-1){
$char=substr($primer,$i,1);
#$prod*=length $hash{$char} if ($char !~/[ATCG]/) ;
if ($char !~/[ATCG]/) {
push @pos_list,$i;
push @char_list,$hash{$char};
##首先找出所有的不是ATCG的碱基位置以及它对应的碱基
## record all of the positions which are not ATCG;
}
}
@out_list=($primer);
##循环处理每个不是ATCG的碱基位置，让它们根据对应关系扩展
foreach my $i (0..scalar(@pos_list)-1){
@out_list=&new_out_list(\@out_list,$pos_list[$i],$char_list[$i]);
} ##&new_out_list 这个函数非常重要，会把数组不停的扩展，最终达到应该有的个数！
print join"\n",@out_list;
print "\n";
}
sub new_out_list{
my @array = @{$_[0]};
my $pos = $_[1];
my $char = $_[2];
my @new_array=();
foreach my $i (@array){
foreach my $j (0..length($char)-1){
substr($i,$pos,1,substr($char,$j,1));
push @new_array,$i;
}
}
return(@new_array);
}
primer2multiple('ATGCVCGCDCTNCCTGAB');
__DATA__
R:ag
Y:CT
M:AC
K:GT
S:gc
W:AT
H:atc
B:gtc
V:gac
D:GAT
N:ATgc

ADD COMMENT • link 6.9 years ago by jmzeng1314 ▴ 140

score 2 · Accepted Answer · 2017-07-03

2

Entering edit mode

6.9 years ago

5heikki 11k

This is super easy with Python, see e.g. here.

Copy pasted one option below:

from Bio import Seq
from itertools import product

def extend_ambiguous_dna(seq):
   """return list of all possible sequences given an ambiguous DNA input"""
   d = Seq.IUPAC.IUPACData.ambiguous_dna_values
   return [ list(map("".join, product(*map(d.get, seq)))) ]

ADD COMMENT • link 6.9 years ago by 5heikki 11k

0

Entering edit mode

It's really easy by using python, how do you explain the functions product and seq

ADD REPLY • link 6.9 years ago by jmzeng1314 ▴ 140

0

Entering edit mode

All this stuff is well documented, seq & itertools.

ADD REPLY • link 6.9 years ago by 5heikki 11k

0

Entering edit mode

great, but both you and I made a mistake, he need the R solution

ADD REPLY • link 6.9 years ago by jmzeng1314 ▴ 140