Biostar Beta. Not for public use.
Beginner in Python- translating DNA given in GenBank file format into its six reading frames as output
0
Entering edit mode
3.3 years ago
oki4 • 10

Goal: Your task is to write a program to translate a DNA sequence, given in a GenBank file format called sequence.gb, into all six reading frames as output. We are given a template or starting code to work with

GenBank input file: http://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb

My code:

    from urllib.request import urlopen
##    ''' The dna2rna function converts a sequence of DNA, given as a
##        parameter and returns an RNA sequence.
##    '''

def dna2rna(sequence):
    rna_seq = sequence.replace('T', 'U')
    return(rna_seq)

codon2aa = {'aaa': 'K', 'aac': 'N', 'aag': 'K', 'aau': 'N',
            'aca': 'T', 'acc': 'T', 'acg': 'T', 'acu': 'T',
            'aga': 'R', 'agc': 'S', 'agg': 'R', 'agu': 'S',
            'aua': 'I', 'auc': 'I', 'aug': 'M', 'auu': 'I',

            'caa': 'Q', 'cac': 'H', 'cag': 'Q', 'cau': 'H',
            'cca': 'P', 'ccc': 'P', 'ccg': 'P', 'ccu': 'P',
            'cga': 'R', 'cgc': 'R', 'cgg': 'R', 'cgu': 'R',
            'cua': 'L', 'cuc': 'L', 'cug': 'L', 'cuu': 'L',

            'gaa': 'E', 'gac': 'D', 'gag': 'E', 'gau': 'D',
            'gca': 'A', 'gcc': 'A', 'gcg': 'A', 'gcu': 'A',
            'gga': 'G', 'ggc': 'G', 'ggg': 'G', 'ggu': 'G',
            'gua': 'V', 'guc': 'V', 'gug': 'V', 'guu': 'V',

            'uaa': '_', 'uac': 'Y', 'uag': '_', 'uau': 'Y',
            'uca': 'S', 'ucc': 'S', 'ucg': 'S', 'ucu': 'S',
            'uga': '_', 'ugc': 'C', 'ugg': 'W', 'ugu': 'C',
            'uua': 'L', 'uuc': 'F', 'uug': 'L', 'uuu': 'F'}
if __name__ == '__main__':
    with urlopen('https://web.njit.edu/~kapleau/teach/current/bnfo135/sequence.gb') as conn:
        data = conn.readlines()
    lines = [line.strip() for line in [datum.decode() for datum in data]]
    flag = False
    dna = ''

for line in lines:
    ## if the flag is 'True', append the line to 'dna'.
    if flag == True:
        dna.append(line)
    ## if the word "ORIGIN" is in the line, set 'flag' to 'True'
        if 'ORIGIN' in line:
            flag = True
    pass

## gets rid of any non-dna character.
dna = dna.translate(str.maketrans('acgt', 'acgt', '0123456789 /'))

## calls the dna2rna function
rna = dna2rna(dna)

**## process the first 3 reading frames
for i in range(3):
    if rna[0:3] in codon2aa:**

    ## create a variable 'seq' and assign it the rna to process
    seq = ''
    amino = ''
    while len(seq) >= 3:
        ## use the codon2aa table to append an amino acid to 'amino'
        ## update 'seq' to the next codon
        pass
    print('--- Reading Frame %i ---' % (i+1), amino, sep='\n')
##
##    ## compute the reverse complement of 'rna' and assign the result
##    ## back into the 'rna' variable
##
##    ## process the next 3 reading frames. hint: just like the first 3
##    for i in range(3):
##        ## same as the first 3
##        print('--- Reading Frame %i ---' % (i+4), amino, sep='\n')
##

I would like to know if I'm on the correct path so far. Also I'm having trouble, processing the 3 reading frames (bolded section), and would like some input. Thanks.

python GenBank • 1.7k views
ADD COMMENTlink
0
Entering edit mode

Have you been instructed not to use a library like biopython?

This can be accomplished pretty easily with SeqIO with the builtin translate() from biopython.

ADD REPLYlink
0
Entering edit mode

No I can't use BioPython unfortunately.

ADD REPLYlink
0
Entering edit mode

Hi,

Did you ever find an answer to the project?

ADD REPLYlink
0
0
Entering edit mode

Is that hyperlink meant to take me back to this page?

ADD REPLYlink
0
Entering edit mode

It takes you to the comment made by Eric Lim, suggesting you to use translate() from Biopython.

ADD REPLYlink
1
Entering edit mode

The biopython cookbook actually shows how to do this but instead of translate you could just call your table instead.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1