How to read a FASTA file and insert the sequence in another function calling a class?
1
0
Entering edit mode
5.1 years ago

I have an assignment where I have to write a python class to represent and manipulate biological sequences. I have almost finished the class, however I need to import sequences from FASTA files input the class, so that it can run them. In order to run the class in the command line, I was required to create a run_me.py file, which when executed in the command line, should ask for the FASTA file and then run the class.

This is the class constructor:

class Bioseq:
    """Biological sequence class"""
    def __init__(self, seq):
        self.seq = seq

And this is how far I've gotten with the run_me.py file:

def read_fasta():
        with open(file_name, "r") as fh:
            line = fh.readline()
            header = ""
            seq = ""
            while line:
                line = line.rstrip('\n')
                if ">" in line:
                    header = line
                else:
                    seq = seq + line
                line = fh.readline()
        return seq
        print(seq)


from TPC2_MR2 import Bioseq
file_name = input("Please enter sequence file name:")
Bioseq = Bioseq(read_fasta())
Bioseq.main()

TPC2_MR2 is the name of the class module. Now the problem is that when I execute the run_me.py file, I am asked to provide the name of the FASTA file but after that, the sequence is not imported into the class. What should I do?

Please note that I cannot use external Pyhton modules, such as Biopython.

FASTA Python • 4.8k views
ADD COMMENT
0
Entering edit mode

I think there are a few issues, but the most glaring ones are that when you call Bioseq(read_fasta()), you arent providing an argument to read_fasta() so there’s no data being input. Similarly, your Bioseq contains no method called main(), so your last line would throw an error. Also, it’s unwise to call the variable the same thing as the class, as you’re polluting the namespace. Since you’ve imported Bioseq, that name is taken.

Do you want the class to do the parsing of the file, or do you want to parse the file separately and just pass the class the relevant parts? Are your input files multifastas?

ADD REPLY
0
Entering edit mode

Thank you for replying!

Actually, I had added the main() function to Bioseq, I just didn't include it in the question (my bad). Here it is:

def main(self):
        self.seq_type(self.seq)
        self.call_methods(self.seq)

    if __name__ == '__main__':
        main()

The seq_type and the call_methods are two functions within the class that I wish to run.

Regarding the read_fasta(), I want to put the output from the read_fasta function into the class (later refered to as seq), and there is only one sequence in each fasta file

ADD REPLY
0
Entering edit mode
5.1 years ago
Joe 21k

Try something like this skeleton below, you can add any methods you like within Bioseq so long as they're making use of the instance data (or need no data).

I've included a couple of dummy functions, namely Bioseq.as_fasta() and Bioseq.length(). You can follow this pattern and add whatever others you like.

You shouldn't be 'running' the class. It doesn't need to know anything about the if __name__ == '__main__': construct. The class is purely there to hold the data logically together. In the case below, multifasta's are also handled by the separate splitfa() function, it returns a header and its sequence one by one as the file is iterated, and then those are passed directly in to Bioseq. When you call Bioseq on the input data (in this case a string for the header, and for the sequence), the __init__ method is the constructor and does everything else. If you want a class variable to be the result of some calculation, you can make that so in __init__ just calls that function and assigns the variable (as in the toy case for perc_gc.

Hope that all helps.

from itertools import groupby
import sys


def splitfa(fasta):
    with open(fasta, 'r') as fh:
        faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
        for header in faiter:
            header = header.next()[1:].strip()
            seq = "".join(s.strip() for s in faiter.next())
            yield header, seq


class Bioseq(object):
    def __init__(self, id, seq):
        self.id = id
        self.seq = seq
        self.perc_gc = self.gc()

    def as_fasta(self):
        """Print the class as a fasta."""
        print(">{}\n{}\n".formatself.id, self.seq))

    def length(self):
        """Get the sequence length."""
        return len(self.seq)

    def gc(self):
        """Calcuate GC percentage."""
        return float(self.seq.upper().count("G") + self.seq.upper().count("C")/len(self.seq)*100)


def main():
    for h, s in splitfa(sys.argv[1]):
        i = Bioseq(id=h, seq=s)
        ...
        # do stuff with the instance of Bioseq


if __name__ == '__main__':
    main()
ADD COMMENT

Login before adding your answer.

Traffic: 1771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6