Question

interpretation of RNA secondary structure

2

Entering edit mode

9.1 years ago

fareehakanwal90 ▴ 30

Hello everyone.

I need a code that can give me chunks of secondary structure elements from a given RNA structure and its sequence. e.g. if I have a sequence and its structure like this:

sequence: UGCCCUAGUACGAGAGGACCGGAGUG

Structure: ...(((........))).........

I should be able to get the output like:

Loop1: UGC
Loop2: AGUACGAG
Loop3:GGAGUG
Stem1: CCUAGG

The position of every dot corresponds to the position of it respective nucleotide. I have written this code but it gives me all the nucleotides which form loop as one string and all those forming stems as one string. I want fragments.

RNA-Seq • 4.2k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by fareehakanwal90 ▴ 30

0

Entering edit mode

So let me clarify:

UGC    CCU       AGUACGAG       AGG       ACCGGAGUG
...    (((       ........       )))       .........

Considering your example, you are saying that

each . = nucleotide on loop

( and ) = nucleotide on stem

Right?

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Sam ★ 4.7k

0

Entering edit mode

It's the output of RNAfold.

The sequence is aligned with structure. for example, for the dot at position one you have to check the character at corresponding position in string. I am saving them both as strings and their length is same.

Here is my code:

#str1 contains sequence and str2 contains structure in dot bracket format.

for k in range(len(str1)):
            for character in str2:
                if character=='.':
                    loop+=str1[k]
            elif str2[k]=='(' or str2[k]==')':
             stem+=str1[k]
        print "loop", loop
        print "stem", stem

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by fareehakanwal90 ▴ 30

0

Entering edit mode

yes this is exactly what i am saying

ADD REPLY • link 9.1 years ago by fareehakanwal90 ▴ 30

0

Entering edit mode

9.1 years ago

fareehakanwal90 ▴ 30

can someone please tell me how can I pass variables to the from_fasta() function?

ADD COMMENT • link 9.1 years ago by fareehakanwal90 ▴ 30

0

Entering edit mode

I never used that programme. But you can take a look here

So assuming you have a file called INPUT with the following format

> id ACCGGGG ((...))

You will do something like from_fasta(INPUT)

With multiple input, I guess the file should look like

> id1 ACCGGGG ((...))
> id2 UGCCCUAGUACGAGAGGACCGGAGUG ...(((........))).........

So according to Ryan's example, maybe you can do

import forgi.graph.bulge_graph as fgb
bg = fgb.BulgeGraph()
bg.from_fasta(INPUT)
for d in bg.defines:
    print d, bg.get_define_seq_str(d)

But I have never used this programme, so I am not sure

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by Sam ★ 4.7k

0

Entering edit mode

It doesn't work, gives me the error:

bg.from_fasta(text, dissolve_length_one_stems=False)
  File "/usr/local/lib/python2.7/dist-packages/forgi/graph/bulge_graph.py", line 1272, in from_fasta
    self.from_dotbracket(lines[2].strip(), dissolve_length_one_stems)
IndexError: list index out of range

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by fareehakanwal90 ▴ 30

Ram · Accepted Answer · 2015-03-06

So, just for my own perl coding practice, I wrote the following code that will do what you want (without explicit debugging though...). This programme will take in one file, where the first line is the sequence and the second line is the structure, and generate the required output

#!/usr/bin/perl
use warnings;
use strict;
open FILE, "<", $ARGV[0] or die $!;
my $count = 0;
my $seq;
my  $structString;
while (<FILE>) {
    if($count == 0){
        $seq = ($_);
        $count = 1;
    }elsif($count==1){
        $structString = ($_);
        last;
    }
}
close FILE;
my @seqCharacter = split //, $seq;
my @structChar = split //, $structString;
my $loopCount=1;
my $stemCount=1;
my $currentLoop="";
my @stemStore;
my @stemName;
my $prevChar="";

for(my $i = 0; $i < @structChar; $i=$i+1){
    if($prevChar eq ""){ #The first character
        if($structChar[$i] eq "."){
            $currentLoop = $seqCharacter[$i];
            $prevChar = ".";
        }elsif($structChar[$i] eq "("){
            $stemStore[0]=$seqCharacter[$i];
            $stemName[0] = "stem".$stemCount;
            $stemCount=$stemCount+1;
            $prevChar="(";
        }else{
            print "Undefined input, structure must start with either ( or .\n";
            die;
        }
    }elsif($prevChar eq "." && $structChar[$i] eq "."){ #Continuation
        $currentLoop=$currentLoop.$seqCharacter[$i];
    }elsif($prevChar eq "." && $structChar[$i] eq "("){ #Encounter stem
        print "loop".$loopCount.": ".$currentLoop."\n";
        $loopCount = $loopCount+1;
        $currentLoop = "";
        $prevChar = "(";
        my $size = @stemStore;
        $stemStore[$size]=$seqCharacter[$i]; #we have a new stem
        $stemName[$size] = "stem".$stemCount;
        $stemCount = $stemCount+1;
    }elsif($prevChar eq "." && $structChar[$i] eq ")"){
        if(@stemStore == 0){
            print "Unmatched ) please check your input\n";
            die;
        }else{
            print "loop".$loopCount.": ".$currentLoop."\n";
            $loopCount = $loopCount+1;
            $currentLoop = "";

            $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i];    #Add this to the last loop so we can close it
            $prevChar = ")";
        }
    }elsif($prevChar eq "(" && $structChar[$i] eq "("){
        $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Continue
    }elsif($prevChar eq "(" && $structChar[$i] eq "."){
        $prevChar = ".";
        $currentLoop = $seqCharacter[$i];

    }elsif($prevChar eq "(" && $structChar[$i] eq ")"){
        $prevChar = ")";
        $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i];
    }elsif($prevChar eq ")" && $structChar[$i] eq ")"){
        $stemStore[@stemStore-1]=$stemStore[@stemStore-1].$seqCharacter[$i]; #Add to the last loop
    }elsif($prevChar eq ")" && $structChar[$i] eq "."){
        $prevChar = ".";
        print $stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n";
        pop(@stemName);
        pop(@stemStore);
    }elsif($prevChar eq ")" && $structChar[$i] eq "("){
        $prevChar = ")";
        print $stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n";
        pop(@stemName);
        pop(@stemStore);
    }
}
#output remaining
if($currentLoop ne ""){
    print "loop".$loopCount.": ".$currentLoop."\n";
}elsif(@stemName != 0 && $prevChar eq ")"){
    print $stemName[@stemName-1].": ".$stemStore[@stemStore-1]."\n";
    pop(@stemName);
    pop(@stemStore);
}else{
    print "There are unmatched stem remaining\n";
}

Ram · Accepted Answer · 2015-03-06

2

Entering edit mode

9.1 years ago

Ryan Dale 5.0k

forgi (github page) is a Python package for working with RNA secondary structures. I've used it in the past for tasks like this.

Here's an example. By the way, might want to check your expected output against the example sequence you provided.

import forgi.graph.bulge_graph as fgb
bg = fgb.BulgeGraph()
bg.from_fasta(
"""\
>demo
UGCCCUAGUACGAGAGGACCGGAGUG
...(((........))).........
""")
for d in bg.defines:
    print d, bg.get_define_seq_str(d)

Here's the output. Notice it keeps the two sides of the stem separate, and the 5' and 3' "stems" are listed separately.

f1 ['UGC']
h0 ['AGUACGAG']
s0 ['CCU', 'AGG']
t1 ['ACCGGAGUG']

Edit: forna by the same author is useful for visualizing structures and is helpful for debugging and troubleshooting this sort of analysis.

Edit 2: In response to the comment asking about providing a string variable to from_fasta:

It's not documented, but after playing around it looks like this method requires 1) the record must start with ">" -- not even a newline is allowed, and 2) a trailing newline at the end of the structure line. So if you have your strings sequence and structure, you can build a string that from_fasta will accept like this:

text = '\n'.join(['> ', sequence, structure, '\n'])
bg.from_fasta(text)

or if you want to include a sequence ID:

text = '\n'.join(['\n', '> ' + seqid, sequence, structure, '\n'])
bg.from_fasta(text)

ADD COMMENT • link updated 23 months ago by Ram 43k • written 9.1 years ago by Ryan Dale 5.0k

0

Entering edit mode

Thanks this worked for me :)

ADD REPLY • link updated 23 months ago by Ram 43k • written 9.1 years ago by fareehakanwal90 ▴ 30

0

Entering edit mode

how can I pass string variables to the from_fasta function?

ADD REPLY • link 9.1 years ago by fareehakanwal90 ▴ 30

0

Entering edit mode

I edited the answer to show this.

ADD REPLY • link 9.1 years ago by Ryan Dale 5.0k

0

Entering edit mode

thankyou very much

ADD REPLY • link 9.1 years ago by fareehakanwal90 ▴ 30