Biostar Beta. Not for public use.
Shell script to reverse complement DNA, but with a twist (certain bases with a long name)
0
Entering edit mode
15 months ago
johan • 50
France

I use this shell script (I'm not the author) to reverse complement DNA:

alias revcomp="echo 'import sys; print \"\".join([dict(zip(\"ACGTacgt\",\"TGCAtgca\"))[c] for c in sys.argv[1][::-1]])' | python -"

In the terminal it works perfect just to write revcomp ATGC which outputs GCAT. And to just complement, I changed it to

TGCAtgc\",\"TGCAtgc.

I'm working with some files, where for reasons I'm not in charge for, some bases written as a word. So let's say that one base is called (askJoeaboutthisone), how would I complement this sequence? I.e. so that AT(askJoeaboutthisone)GC returns CG(askJoeaboutthisone)TA.

In my script each letter is converted, but I wish that (askJoeaboutthisone) is treated as one letter.

Thanks!

ADD COMMENTlink
0
Entering edit mode

What you're really doing is transliterating (and then reversing the sequence). The problem is, you have to iterate the string character by character.

Is it possible for your example to use mixed case? i.e. your long strings are all lower case, but your normal bases are all upper? It'll be tricky to delineate whats a string from within a string. Are the parenthesis actually part of your 'data' or is that just for the benefit of the post/explanation?

ADD REPLYlink
2
Entering edit mode
12 months ago
5heikki 8.4k
Finland
echo "AT(askJoeaboutthisone)GC" \
    | sed 's/(askJoeaboutthisone)/#/g' \
    | tr "[ATGCYRSWKMBDHVatgcyrswkmbdhv]" "[TACGRYSWMKVHDBtacgryswmkvhdb]" \
    | rev \
    | sed 's/#/(askJoeaboutthisone)/g'
GC(askJoeaboutthisone)AT

So, before the complementation (the tr line), we first replace (askJoeaboutthisone) with #, then after complementation and reversing (the rev line), we replace # with (askJoeaboutthisone). We do the temporary replacement so that the letters in the joestring don't get complemented by tr nor reversed by rev

Edit. You can save this as a script:

#!/bin/bash
if [ ! -z "$1" ]; then
    echo "$1" \
        | sed 's/(askJoeaboutthisone)/#/g' \
        | tr "[ATGCYRSWKMBDHVatgcyrswkmbdhv]" "[TACGRYSWMKVHDBtacgryswmkvhdb]" \
        | rev
        | 's/#/(askJoeaboutthisone)/g'
else
    echo ""
    echo "usage: rev_comp_seq_joe_twist DNASEQUENCE"
    echo ""
fi
ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1