Biostar Beta. Not for public use.
Table With Snornabase Information?
1
Entering edit mode
12 months ago
Leandro Lima • 920
San Francisco, CA

Hello!

I have a list of snoRNABase ids (https://www-snorna.biotoul.fr/browse.php?sno=CDBox), and I need to get the information available in the pages like this:

https://www-snorna.biotoul.fr/plus.php?id=SNORD125

I'm writing a program to get it, but... I was wondering if someone here has already done it.

annotation • 901 views
ADD COMMENTlink
1
Entering edit mode

thanks for following up and posting the solution

ADD REPLYlink
2
Entering edit mode
12 months ago
Leandro Lima • 920
San Francisco, CA

The code, in Python.

# download_snoRNABase_info.py
# Created in: Jan 16, 2013
# Last modified in: Jan 16, 2013
# Leandro Lima <llima@cipe.accamargo.org.br>

from lxml.html import parse, document_fromstring
from twill.commands import *
import twill
from StringIO import StringIO
twill.set_output(StringIO())

output = open('snoRNA_info.csv', 'w')
output.write('name\tsno_id\tdescription\tbiotype\n')
link = 'snoRNABase_CDBox.html'
page = parse(link).getroot()
tr = page.find_class('traitbleu')
for td in tr[0].getchildren():
    # Reading ids
    for a in td.cssselect('a'):
        name = a.text
        link = 'https://www-snorna.biotoul.fr/plus.php?id=' + name
        x = go(link)
        text = show()
        page2 = document_fromstring(text)
        table = page2.find_class('tablecadre')[1]
        tr2 = table.getchildren()[2]
        sno_id = tr2[0].getchildren()[0].text
        description = tr2[1].text
        print name, sno_id, description
        output.write('%s\t%s\t%s\t%s\n' % (name, sno_id, description, 'snoRNA'))
ADD COMMENTlink
1
Entering edit mode
12 months ago
Leandro Lima • 920
San Francisco, CA

Solved.

=)

The result, in case you need:

www.vision.ime.usp.br/~llima/snoRNA_info.csv

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1