Question

Table With Snornabase Information?

1

Entering edit mode

11.3 years ago

Leandro Lima ▴ 970

Hello!

I have a list of snoRNABase ids (https://www-snorna.biotoul.fr/browse.php?sno=CDBox), and I need to get the information available in the pages like this:

https://www-snorna.biotoul.fr/plus.php?id=SNORD125

I'm writing a program to get it, but... I was wondering if someone here has already done it.

annotation • 1.5k views

ADD COMMENT • link 11.3 years ago by Leandro Lima ▴ 970

1

Entering edit mode

thanks for following up and posting the solution

ADD REPLY • link 11.3 years ago by Istvan Albert 100k

score 2 · Answer 1 · 2013-01-17

The code, in Python.

# download_snoRNABase_info.py
# Created in: Jan 16, 2013
# Last modified in: Jan 16, 2013
# Leandro Lima <llima@cipe.accamargo.org.br>

from lxml.html import parse, document_fromstring
from twill.commands import *
import twill
from StringIO import StringIO
twill.set_output(StringIO())

output = open('snoRNA_info.csv', 'w')
output.write('name\tsno_id\tdescription\tbiotype\n')
link = 'snoRNABase_CDBox.html'
page = parse(link).getroot()
tr = page.find_class('traitbleu')
for td in tr[0].getchildren():
    # Reading ids
    for a in td.cssselect('a'):
        name = a.text
        link = 'https://www-snorna.biotoul.fr/plus.php?id=' + name
        x = go(link)
        text = show()
        page2 = document_fromstring(text)
        table = page2.find_class('tablecadre')[1]
        tr2 = table.getchildren()[2]
        sno_id = tr2[0].getchildren()[0].text
        description = tr2[1].text
        print name, sno_id, description
        output.write('%s\t%s\t%s\t%s\n' % (name, sno_id, description, 'snoRNA'))

score 1 · Answer 2 · 2013-01-16

1

Entering edit mode

11.3 years ago

Leandro Lima ▴ 970

Solved.

=)

The result, in case you need:

www.vision.ime.usp.br/~llima/snoRNA_info.csv

ADD COMMENT • link 11.3 years ago by Leandro Lima ▴ 970