I Wonder Why It Is So Important To Use Seq Objects In Stead Of Plain Ol' Strings In Biopyton?
5
7
Entering edit mode
14.0 years ago
Vlinxify ▴ 120

This may seem like a superfluous question, and perhaps it is, but it's important to get the basic raison d'etres of the programming habits that are encouraged in the tutorial straight. (Wow that's a strange and awkward sentence but it's good to write in English and show off literal non-skills.)

In short: Why use:

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC

messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG",
    IUPAC.unambiguous_rna)
messenger_rna
    Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', 
    IUPACUnambiguousRNA())
messenger_rna.translate()
    Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))

When you can simply use:

from Bio.Seq import translate

my_string_messenger_rna = "AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG"
translate(my_string_messenger_rna)
    'MAIVMGR*KGAR*'
python seq biopython • 5.4k views
ADD COMMENT
7
Entering edit mode
14.0 years ago

The difference is in the type of programming one pursues.

One approach, as show in your first example is the object oriented (OO) programming where concepts such as a sequence are represented as classes with attributes, each class instance has methods that can operate on these attributes.

A second paradigm is that of functional programming (FP) where data, such as the string in your second example, is transformed via functions to produce different data.

Most programming books and formal courses emphasize the OO approach. At the risk of sounding a bit cynical I think that is most likely because it has many rules and regulations that are suited to be divided into an episodic format ... Hello everyone today we will learn about virtual private methods

With time my personal preferences have strongly shifted towards functional type programming. I think their simplicity and transparency leads to simpler solutions with fewer errors.

ADD COMMENT
4
Entering edit mode

Good answer, but I would use the term "procedural programming" to describe non-OO programming in which functions are not associated with data. The term "functional programming" describes a type of stateless non-imperitive programming such like that used in R or Haskell. http://en.wikipedia.org/wiki/Functional_programming

ADD REPLY
0
Entering edit mode

This makes sense. When it's already possible to use comments to make your statements clearer, it's seems kind of redundant to make objects out of often used concepts. Nonetheless, I will stick to the advice given by the tutorial.

ADD REPLY
0
Entering edit mode

virtual private method

In C#, method overriding works so: the parent class needs to declare overridable methods as virtual methods and the child class has to declare a method with the same name and with override in its declaration. Also, a private method cannot be used/overridden through instantiation or inheritance.

All of this to say "virtual private methods" set off all kinds of alarm bells in my head :-)

ADD REPLY
0
Entering edit mode

Great answer. OOP always felt stupid to me, like why increase the complexity of the program.. I suppose there comes a point where OOP actually decreases complexity, but the commonly referred to a few hundred LOC isn't it, IMO. Maybe around 1k LOC you cross the point where OOP actually helps..

ADD REPLY
7
Entering edit mode
14.0 years ago

I guess it's mainly for historical reasons. Since the early days of Biopython, sequences came with an associated alphabet (the IUPAC.unambiguous_rna in the example above). The advantage is that it protects against applying inappropriate operations to sequences, such as trying to translate or reverse-complement a protein sequence. In hindsight, maybe a simple string would have been better, since probably few people actually make use of the associated alphabet. While changing such a fundamental object in Biopython can be risky, I wouldn't be surprised if alphabets are removed in a future version of Biopython, or at least play a less prominent role.

ADD COMMENT
0
Entering edit mode

Bioinformatics and sequence analysis seem to be one of those fields which pre-eminently are tuned for strings. It's like the hole concept of 'strings' is made for sequence analysis.

ADD REPLY
6
Entering edit mode
13.2 years ago
  1. For short scripts like your example, procedural programming is often preferable, but procedural programs can become conceptually unmanageable beyond a few hundred lines. Because a string cannot hold other data like the defline/accession so when you need to write a procedural program that uses those things you end up having functions with a dozen arguments instead of one, or you write a data structure to hold all that, thereby reinventing the wheel. I tell younger programmers they should be using objects once their functions use more than 3 arguments.
  2. data validation, as mentioned
  3. an IDE cannot suggest what Bio.Seq class methods can be performed on your string
ADD COMMENT
5
Entering edit mode
14.0 years ago
Eric T. ★ 2.8k

Here are some more reasons why the tutorial, in particular, uses Seq objects instead of strings:

  • This chapter introduces you to the Seq and Alphabet objects, so you can explore the rest of the available methods on your own. It's not strictly meant to show the fastest way to translate a sequence; it's showing you what's available.
  • The MutableSeq and UnknownSeq objects follow the same API as Seq, but have special features that Python strings don't -- so introducing Seq also indirectly helps explain those other two types
  • The SeqRecord object uses a Seq object, and SeqIO uses SeqRecords. Showing how to create a Seq object prepares you for using SeqIO later. (Maybe this could be simplified in the future by letting SeqRecords be built with a raw string.)
ADD COMMENT
0
Entering edit mode
10.6 years ago
vaskin90 ▴ 290

"Use Seq objects instead of strings" cannot be a rule of a thump. It always depends on the purpose of your code. If you want to try something quickly I would encourage you to use strings instead of Seq objects. Because it is faster to write. But if you write code that you or someone else will use in the future Seq objects is a better option because it results in more stable code.

ADD COMMENT

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6