Expand a DNA sequence with IUPAC codes into multiple sequence in R
2
0
Entering edit mode
6.9 years ago

Hi!

I could not come with a code that can expand a sequence like seq= 'ARCS'. What would be the algorithm or (pseudo)code for the expansion in R?

Thanks in advance.

p.s.

Many useful python codes have been generously supplied, but I dont know python!

R sequence • 3.4k views
ADD COMMENT
2
Entering edit mode
6.9 years ago
5heikki 11k

This is super easy with Python, see e.g. here.

Copy pasted one option below:

from Bio import Seq
from itertools import product

def extend_ambiguous_dna(seq):
   """return list of all possible sequences given an ambiguous DNA input"""
   d = Seq.IUPAC.IUPACData.ambiguous_dna_values
   return [ list(map("".join, product(*map(d.get, seq)))) ]
ADD COMMENT
0
Entering edit mode

It's really easy by using python, how do you explain the functions product and seq

ADD REPLY
0
Entering edit mode

All this stuff is well documented, seq & itertools.

ADD REPLY
0
Entering edit mode

great, but both you and I made a mistake, he need the R solution

ADD REPLY
1
Entering edit mode
6.9 years ago
jmzeng1314 ▴ 140

what a coincidence!

I've the code for this question,also there's a brief description for it if you can understand our Chinese.

http://www.biotrainee.com/thread-668-1-1.html

while(<DATA>){
chomp;
@F=split/:/;
$hash{$F[0]}=uc $F[1];
} ##这里记录简并碱基的对应关系
## %hash stored the tables;
sub primer2multiple{
$primer=$_[0];
$prod=1;
$primer_len=length $primer ;
foreach $i (0..$primer_len-1){
$char=substr($primer,$i,1);
#$prod*=length $hash{$char} if ($char !~/[ATCG]/) ;
if ($char !~/[ATCG]/) {
push @pos_list,$i;
push @char_list,$hash{$char};
##首先找出所有的不是ATCG的碱基位置以及它对应的碱基
## record all of the positions which are not ATCG;
}
}
@out_list=($primer);
##循环处理每个不是ATCG的碱基位置,让它们根据对应关系扩展
foreach my $i (0..scalar(@pos_list)-1){
@out_list=&new_out_list(\@out_list,$pos_list[$i],$char_list[$i]);
} ##&new_out_list 这个函数非常重要,会把数组不停的扩展,最终达到应该有的个数!
print join"\n",@out_list;
print "\n";
}
sub new_out_list{
my @array = @{$_[0]};
my $pos = $_[1];
my $char = $_[2];
my @new_array=();
foreach my $i (@array){
foreach my $j (0..length($char)-1){
substr($i,$pos,1,substr($char,$j,1));
push @new_array,$i;
}
}
return(@new_array);
}
primer2multiple('ATGCVCGCDCTNCCTGAB');
__DATA__
R:ag
Y:CT
M:AC
K:GT
S:gc
W:AT
H:atc
B:gtc
V:gac
D:GAT
N:ATgc
ADD COMMENT

Login before adding your answer.

Traffic: 1231 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6