Error Translating Genbank Cds Using Biopython
1
2
Entering edit mode
13.1 years ago
Tim ▴ 340

Hi all,

I'm trying to translate the a genbank record using BioPython 1.53, ignoring the already given translation in the CDS feature. The code I've written to translate this is pretty straight forward:

...
for gb_record in SeqIO.parse(file_handle, 'genbank'):#Bio.GenBank.Record
    for gb_feature in gb_record.features:#Bio.SeqFeature
        #Skip any non coding sequence features
        if gb_feature.type != 'CDS':
            continue

        #Protein identifier is a property of the genbank feature
        protein_id = gb_feature.qualifiers['protein_id'][0]

        #Original sequence retrieved through BioPython 1.53+'s internal method 
        extracted_seq = gb_feature.extract(gb_record.seq)#Bio.Seq.Seq

        #Translation table is a property of the genbank feature
        transl_table = gb_feature.qualifiers['transl_table'][0]

        #Translate entire sequence as coding sequence using translation table
        #Additional CodonTables optionally available from Bio.Data.CodonTable
        try:
            protein_seq = extracted_seq.translate(table = transl_table, cds = True)
        except TranslationError, err:
            log.error('%s: Error in translating %s\n%s', gb_record.id, protein_id, extracted_seq)
            raise err

        #Write out fasta. Header format as requested: >genome_ac|protein_id
        _write_fasta_line(write_handle, '{0}|{1}'.formatgb_record.id, protein_id), str(protein_seq))

The translate line throws a TranslationError on the following feature:

 CDS             complement(2276255..2279302)
                 /locus_tag="ECBD_2165"
                 /EC_number="1.7.99.4"
                 /inference="protein motif:TFAM:TIGR01553"
                 /note="KEGG: ssn:SSON_1650 formate dehydrogenase-N,
                 nitrate-inducible, alpha subunit;
                 TIGRFAM: formate dehydrogenase, alpha subunit;
                 PFAM: molybdopterin oxidoreductase; molybdopterin
                 oxidoreductase Fe4S4 region; molydopterin
                 dinucleotide-binding region"
                 /codon_start=1
                 /transl_except=(pos:complement(2278715..2278717),aa:Sec)
                 /transl_table=11
                 /product="formate dehydrogenase, alpha subunit"
                 /protein_id="YP_003036386.1"
                 /db_xref="GI:253773555"
                 /db_xref="InterPro:IPR006311"
                 /db_xref="InterPro:IPR006443"
                 /db_xref="InterPro:IPR006655"
                 /db_xref="InterPro:IPR006656"
                 /db_xref="InterPro:IPR006657"
                 /db_xref="InterPro:IPR006963"
                 /db_xref="GeneID:8157271"
                 /translation="MDVSRRQFFKICAGGMAGTTVAALGFAPKQALAQARNYKLLRAK
                 EIRNTCTYCSVGCGLLMYSLGDGAKNAREAIYHIEGDPDHPVSRGALCPKGAGLLDYV
                 NSENRLRYPEYRAPGSDKWQRISWEEAFSRIAKLMKADRDANFIEKNEQGVTVNRWLS
                 TGMLCASGASNETGMLTQKFARSLGMLAVDNQARVUHGPTVASLAPTFGRGAMTNHWV
                 DIKNANVVMVMGGNAAEAHPVGFRWAMEAKNNNDATLIVVDPRFTRTASVADIYAPIR
                 SGTDITFLSGVLRYLIENNKINAEYVKHYTNASLLVRDDFAFEDGLFSGYDAEKRQYD
                 KSSWNYQFDENGYAKRDETLTHPRCVWNLLKEHVSRYTPDVVENICGTPKADFLKVCE
                 VLASTSAPDRTTTFLYALGWTQHTVGAQNIRTMAMIQLLLGNMGMAGGGVNALRGHSN
                 IQGLTDLGLLSTSLPGYLTLPSEKQVDLQSYLEANTPKATLADQVNYWSNYPKFFVSL
                 MKSFYGDAAQKENNWGYDWLPKWDQTYDVIKYFNMMDEGKVTGYFCQGFNPVASFPDK
                 NKVVSCLSKLKYMVVIDPLVTETSTFWQNHGESNDVDPASIQTEVFRLPSTCFAEEDG
                 SIANSGRWLQWHWKGQDAPGEARNDGEILAGIYHHLRELYQAEGGKGVEPLMKMSWNY
                 KQPHEPQSDEVAKENNGYALEDLYDANGVLIAKKGQLLSSFAHLRDDGTTASSCWIYT
                 GSWTEQGNQMANRDNSDPSGLGNTLGWAWAWPLNRRVLYNRASADINGKPWDPKRMLI
                 QWNGSKWTGNDIPDFGNAAPGTPTGPFIMQPEGMGRLFAINKMAEGPFPEHYEPIETP
                 LGTNPLHPNVVSNPVVRLYEQDALRMGKKEQFPYVGTTYRLTEHFHTWTKHALLNAIA
                 QPEQFVEISETLAAAKGINNGDRVTVSSKRGFIRAVAVVTRRLKPLNVNGQQVETVGI
                 PIHWGFEGVARKGYIANTLTPNVGDANSQTPEYKAFLVNIEKA"

root: ERROR: NC_012947.1: Error in translating YP_003036386.1
ATGGACGTCAGTCGCAGACAATTTTTTAAAATCTGCGCGGGCGGTATGGCTGGAACAACGGTAGCGGCATTGGGCTTTGCCCCGAAGCAAGCACTGGCTCAGGCGCGAAACTACAAATTATTACGCGCTAAAGAGATCCGTAACACCTGCACATACTGTTCCGTAGGTTGCGGGCTATTGATGTATAGCCTGGGTGATGGCGCGAAAAACGCCAGAGAAGCGATTTATCACATTGAAGGTGACCCGGATCATCCGGTAAGCCGTGGTGCGCTGTGCCCAAAAGGGGCCGGTTTGCTGGATTACGTCAACAGCGAAAACCGTCTGCGCTACCCGGAATATCGTGCGCCAGGTTCTGACAAATGGCAGCGCATTAGCTGGGAAGAAGCATTCTCCCGTATTGCAAAGCTGATGAAAGCTGACCGTGACGCTAACTTTATTGAAAAGAACGAGCAGGGCGTAACGGTAAACCGTTGGCTTTCTACCGGTATGCTGTGTGCCTCCGGTGCCAGCAACGAAACCGGGATGCTGACACAGAAATTTGCCCGCTCCCTCGGGATGCTGGCGGTAGACAACCAGGCGCGCGTCTGACACGGACCAACGGTAGCAAGTCTTGCTCCAACATTTGGTCGCGGTGCGATGACCAACCACTGGGTGGATATCAAAAACGCTAACGTCGTAATGGTAATGGGCGGTAACGCTGCTGAAGCGCATCCCGTCGGTTTCCGCTGGGCGATGGAAGCGAAAAACAACAACGATGCAACCTTGATCGTTGTCGATCCTCGTTTTACGCGTACCGCTTCTGTGGCGGATATTTACGCACCTATTCGTTCCGGTACGGACATTACGTTCCTGTCTGGCGTTTTGCGCTACCTGATCGAAAACAACAAAATCAACGCCGAATACGTTAAACATTACACCAACGCCAGCCTGCTGGTGCGTGATGATTTTGCTTTCGAAGATGGCCTGTTCAGCGGTTATGACGCTGAAAAACGCCAGTACGACAAATCGTCCTGGAACTATCAGTTCGATGAAAACGGCTATGCGAAACGCGATGAAACACTGACTCATCCGCGCTGTGTGTGGAACCTGCTGAAAGAGCACGTTTCCCGCTACACGCCGGACGTCGTTGAAAACATCTGCGGTACGCCAAAAGCCGACTTCCTGAAAGTGTGTGAAGTGCTGGCCTCCACCAGCGCACCGGATCGCACAACCACCTTCCTGTACGCGCTGGGCTGGACGCAGCACACCGTGGGTGCGCAGAACATCCGTACTATGGCGATGATCCAGTTACTGCTCGGTAACATGGGTATGGCCGGTGGCGGCGTGAACGCATTGCGTGGTCACTCCAACATTCAGGGCCTGACTGACTTAGGTCTGCTCTCTACCAGCCTGCCAGGTTATCTGACGCTGCCGTCAGAAAAACAGGTTGATTTGCAGTCGTATCTGGAAGCGAACACGCCGAAAGCGACGCTGGCTGATCAGGTGAACTACTGGAGCAACTATCCGAAGTTCTTCGTTAGCCTGATGAAATCTTTCTATGGCGATGCCGCGCAGAAAGAGAACAACTGGGGCTATGACTGGCTGCCGAAGTGGGACCAGACCTACGACGTCATCAAGTATTTCAACATGATGGACGAAGGCAAAGTCACCGGTTATTTCTGCCAGGGCTTTAACCCGGTTGCGTCCTTCCCGGACAAAAACAAAGTGGTGAGCTGCCTGAGCAAGCTGAAGTACATGGTGGTTATCGATCCGCTGGTGACTGAAACCTCTACCTTCTGGCAGAACCACGGCGAGTCGAACGATGTCGATCCGGCGTCTATTCAGACTGAAGTATTCCGTCTGCCTTCGACCTGCTTTGCTGAAGAAGATGGTTCTATTGCTAACTCCGGTCGCTGGCTGCAGTGGCACTGGAAAGGTCAGGATGCGCCGGGCGAAGCGCGTAACGACGGTGAAATTCTGGCGGGTATCTACCATCACCTGCGCGAGCTGTACCAGGCCGAAGGTGGTAAAGGCGTAGAACCGCTGATGAAGATGAGCTGGAACTACAAGCAGCCGCACGAACCGCAATCTGACGAAGTAGCTAAAGAGAACAACGGCTATGCGCTGGAAGATCTCTATGATGCTAATGGCGTGCTGATTGCGAAGAAAGGTCAGTTGCTGAGTAGCTTTGCGCATCTGCGTGATGACGGTACAACCGCATCTTCTTGCTGGATCTACACCGGTAGCTGGACAGAGCAGGGCAACCAGATGGCTAACCGCGATAACTCCGACCCGTCCGGTCTGGGGAATACGCTGGGATGGGCCTGGGCGTGGCCGCTCAACCGTCGCGTGCTGTACAACCGTGCTTCGGCGGATATCAACGGTAAACCGTGGGATCCGAAACGGATGCTGATCCAGTGGAACGGCAGCAAGTGGACGGGTAACGATATTCCTGACTTCGGCAATGCCGCACCGGGTACGCCAACCGGGCCGTTTATCATGCAGCCGGAAGGGATGGGACGCCTGTTTGCTATCAACAAAATGGCGGAAGGTCCGTTCCCGGAACACTACGAGCCGATTGAAACGCCGCTGGGCACTAACCCGCTGCATCCGAACGTGGTGTCTAACCCGGTTGTTCGTCTGTATGAACAAGACGCACTGCGGATGGGTAAAAAAGAGCAGTTCCCGTATGTGGGTACGACCTATCGTCTGACCGAGCACTTCCACACCTGGACCAAGCACGCATTGCTCAACGCAATTGCTCAGCCGGAACAGTTTGTGGAAATCAGCGAAACGCTGGCGGCGGCGAAAGGCATTAATAATGGCGATCGTGTCACTGTCTCAAGCAAGCGTGGCTTTATCCGCGCGGTGGCTGTGGTAACGCGTCGTCTGAAACCACTGAATGTAAATGGTCAGCAGGTTGAAACGGTGGGTATTCCAATCCACTGGGGCTTTGAGGGTGTCGCGCGTAAAGGTTATATCGCTAACACTCTGACGCCGAATGTCGGTGATGCAAACTCGCAAACGCCGGAATATAAAGCGTTCTTAGTCAACATCGAGAAGGCGTAA

Error:

Traceback (most recent call last):
  File "/usr/lib/python2.6/unittest.py", line 279, in run
    testMethod()
  File "/home/user/jenkins/workspace/Divergence/divergence/src/divergence/test/test_translate.py", line 33, in test_translate_ecoli_and_salmo
    fasta_file = translate_genbank_to_protein(genbank_file, ptt_file)
  File "/home/user/jenkins/workspace/Divergence/divergence/src/divergence/translate.py", line 73, in translate_genbank_to_protein
    raise err
TranslationError: Extra in frame stop codon found.

Now I'm guessing this has something to do with the /transl_except I'm seeing in the GenBank record, but I'm not (yet) sure. (The GenBank supplied translation contains a Selenocysteine.) But even if this is the cause: How would I properly handle this in my BioPython translation? I can't find any method to exclude certain sections from translation..

Can anyone help me fix the translation?

Best regards, Tim

(Ps. Should anyone wonder why I'm not using the translation in the GenBank file directly: It's a requirement that I translate from the DNA sequence to protein myself...)

biopython genbank translation • 6.1k views
ADD COMMENT
3
Entering edit mode
13.1 years ago

You want to take out the cds=True option to translate. It is checking that you are translating full length proteins, and correctly raising an error in this case:

- cds - Boolean, indicates this is a complete CDS.  If True,
        this checks the sequence starts with a valid alternative start
        codon (which will be translated as methionine, M), that the
        sequence length is a multiple of three, and that there is a
        single in frame stop codon at the end (this will be excluded
        from the protein sequence, regardless of the to_stop option).
        If these tests fail, an exception is raised.

With cds=False, the default, it will keep translating through the stop codon and not raise an error. Then you'll need to write code to handle the transl_except and convert the internal stop codon into a Selenocysteine.

ADD COMMENT
0
Entering edit mode

Thanks, this seems like a good method to fall back upon.
I was hoping there was some oversight on my side of a way to handle this case using BioPython, as I've seen quite a few people running into similar problems while Googling: BioPython transl_except. Any suggestions in that direction are still welcome! :)

ADD REPLY
0
Entering edit mode

Tim, those other messages look like an old problem from 2005 with parsing GenBank records containing transl_except (woah, memories). You're dealing with a pretty special case here, so will just need some custom code to handle it. If only the record had a translation in it you could use.

ADD REPLY
0
Entering edit mode

Thanks for the explanation, and about the provided translation: I'll discuss again with my supervisors if I can use the already provided translation in when encountering /transl_except records. I think writing my own code to handle the /transl_except cases is far more error-prone than using the provided translation in these rare cases.

ADD REPLY

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6