gbk2faa: why is it not working for refseq gbk?
1
1
Entering edit mode
8.2 years ago
fhsantanna ▴ 610

I have the following script to extract protein sequences from genbank files.

#!/usr/bin/env python

#This script is a modification of the script found in Peter Cock's site (http://www2.warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/genbank2fasta/).
# Usage: python gbk2faa.py <input> <output>

import sys
from Bio import GenBank
from Bio import SeqIO

input_handle = open(sys.argv[1], "r")
output_handle = open(sys.argv[2], "w")

for seq_record in SeqIO.parse(input_handle, "genbank") :
    print "Dealing with GenBank record %s" % seq_record.id
    for seq_feature in seq_record.features :
        if seq_feature.type=="CDS" :
            assert len(seq_feature.qualifiers['translation'])==1
            output_handle.write(">%s from %s\n%s\n" % (
                   seq_feature.qualifiers['locus_tag'][0],
                   seq_record.name,
                   seq_feature.qualifiers['translation'][0]))

output_handle.close()
input_handle.close()
print "Done"

The problem is that it is not working for refseq genbank files (for example, http://www.ncbi.nlm.nih.gov/nuccore/NZ_JYFS01000106). Here is the message error:

Traceback (most recent call last):

File "gbk2faa.py", line 17, in <module>
    assert len(seq_feature.qualifiers['translation'])==1
KeyError: 'translation'

Oddly, it works for other gb files such as http://www.ncbi.nlm.nih.gov/nuccore/JYFS01000106. Could you please give a hint about what is going on?

biopython • 3.1k views
ADD COMMENT
0
Entering edit mode

I found the problem. When the script finds a pseudogene (a CDS without translation) it crashes. I update the script using try and continue, now it works perfectly!

ADD REPLY
0
Entering edit mode

can you put your corrected script here? thank you, Felipe

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode
8.2 years ago
dylan.storey ▴ 60

I don't know python well but -

I'd guess that the translation key doesn't exist

Add a switch statement in to test for the existence of the key 'translation' before you.

Something like

if (defined seq_feature.qualifiers['translation']){
         do something
     }
else {
     warn you - possible print out the record you're trying to access
}
ADD COMMENT
1
Entering edit mode

Sorry the code and the warning were imprecise. I edited the text.

I believe "assert" does what you mean...

ADD REPLY

Login before adding your answer.

Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6