Biostar Beta. Not for public use.
Add A New Feature In Biopython
3
Entering edit mode
6.5 years ago
DF • 100

Hi, I would like to overwrite a feature in a genbank file using BioPython, using ambiguous locations.

For example, I can set an ambiguous location:

from Bio import SeqFeature
start_pos = SeqFeature.AfterPosition(5)
end_pos = SeqFeature.BeforePosition(10)
my_location = SeqFeature.FeatureLocation(start_pos, end_pos)
print my_location


Result:

[>5:<10]


Which is correct.

What I can't work out how to do is to use this ambiguous location to add a new feature to a sequence.

I can read in a genbank file, assign the records to a SeqRecord, and assign the genbank features to SeqFeatures :

for record in SeqIO.parse(myinfile, "genbank"):
new_gb = SeqRecord(record.seq)
new_gb.features=record.features


Then I can overwrite a feature:

new_gb.features[0]=SeqFeature(FeatureLocation(5,10), type="random_feature_type")


which, when I print the SeqRecord out in genbank format (to stdout), correctly overwrites the first feature:

out_handle = StringIO()
SeqIO.write(new_gb, out_handle, "genbank")
new_gb_output = out_handle.getvalue()
print new_gb_output


What I am stuck on is :

1) How to I add a completely new feature to a SeqRecord?

2) How can I use ambiguous locations in the location of a feature in a SeqRecord?

I'm sure this is simple but I can't seem to work it out.

Thanks and I'm sorry if this is stupid, I'm learning.

2
Entering edit mode

*features starts out as an empty list, you should be able to add to is with *features.append(x)?

Also: if you want to save some time on formatting the record to genbank you can use print(record.format("gb"))

0
Entering edit mode

Thanks, yes, basically silly by me, staring at a screen too long and using a SeqRecord when I wanted a SeqFeature. Worked out the use of ambiguous locations too.

0
Entering edit mode

Cool -you should post your solutions as an answer so future googlers can see how do it

1
Entering edit mode

I'm working on an idiot's guide to creating and adding a new feature to a sequence, will post it as an answer to this post once it's done.

6
Entering edit mode
6.5 years ago
DF • 100

Here's a small snippet of code for people (like myself) struggling through BioPython. I'm writing code to fix some annotations that are improperly annotated as being exact when they should really be open-ended (i.e. it's not certain where the feature actually starts). This bit was easy, but I got totally stuck for hours on actually adding the fixed annotation to a new or existing genbank file.

So here's a step-by-step guide (in runnable python code) to creating a new SeqRecord, annotating it with a SeqFeature, and then overwriting the SeqFeature in Biopython (I'm using python 2.7). Obviously this can be re-jigged to import & mess with existing genbank files. I wrote it for ease of understanding, not concise code, so sorry if it's a bit verbose. Simple for some people I know, but hopefully someone finds this useful.

################ A: Make a SeqRecord ################

# 1. Create a sequence

from Bio.Seq import Seq
my_sequence = Seq("GATCGATCGATCGATCGATCGATCGATCGATC")

# 2. Create a SeqRecord and assign the sequence to it

from Bio.SeqRecord import SeqRecord
my_sequence_record = SeqRecord(my_sequence)

# 3. Assign an alphabet to the sequence (in this case DNA)

from Bio.Alphabet import generic_dna
my_sequence_record.seq.alphabet = generic_dna

# This is the minimum required info for BioPython to be able to output
# the SeqRecord in Genbank format.
# You probably would want to add other info (e.g. locus, organism, date etc)

#optional: print the SeqRecord to STDOUT in genbank format.. note there are no features on it yet.
print "\nThis bit is the SeqRecord, printed out in genbank format, with no features added.\n"
print(my_sequence_record.format("gb"))

################ B: Make a SeqFeature ################

# 1. Create a start location and end location for the feature
#    Obviously this can be AfterPosition, BeforePosition etc.,
#    to handle ambiguous or unknown positions

from Bio import SeqFeature
my_start_pos = SeqFeature.ExactPosition(2)
my_end_pos = SeqFeature.ExactPosition(6)

# 2. Use the locations do define a FeatureLocation
from Bio.SeqFeature import FeatureLocation
my_feature_location = FeatureLocation(my_start_pos,my_end_pos)

# 3. Define a feature type as a text string
#     (you can also just add the type when creating the SeqFeature)
my_feature_type = "CDS"

# 4. Create a SeqFeature
from Bio.SeqFeature import SeqFeature
my_feature = SeqFeature(my_feature_location,type=my_feature_type)

my_sequence_record.features.append(my_feature)

#optional: print the SeqRecord to STDOUT in genbank format, with your new feature added.
print "\nThis bit is the SeqRecord, printed out in genbank format, with a feature added.\n"
print(my_sequence_record.format("gb"))

################ C: Overwrite an existing SeqFeature ################

# 1. Create a start location and end location for the feature..
#    This bit is obviously a repeat of "B: Make a SeqFeature" above,
#    normally I'd pull it out to a function, but I'm trying to be explicit here

from Bio import SeqFeature
my_start_pos = SeqFeature.ExactPosition(3)
my_end_pos = SeqFeature.ExactPosition(7)

# 2. Use the locations do define a FeatureLocation
from Bio.SeqFeature import FeatureLocation
my_feature_location2 = FeatureLocation(my_start_pos,my_end_pos)

# 3. Define a feature type as a text string
#    (or you can also just add the type when creating the SeqFeature)
my_feature_type2 = "ABC"

# 4. Create a SeqFeature
from Bio.SeqFeature import SeqFeature
my_feature2 = SeqFeature(my_feature_location2,type=my_feature_type2)

my_sequence_record.features[0]=my_feature2

#optional: print the SeqRecord to STDOUT in genbank format, with your new feature changed.
print "\nThis bit is the SeqRecord, printed out in genbank format, with a feature changed.\n"
print(my_sequence_record.format("gb"))

0
Entering edit mode

Thank you! You just saved me a ton of time today.

0
Entering edit mode

The code snippet provided uses multiple imports and overwrites the meaning of SeqFeature, that confused me to no end. That is, don't do:

from Bio import SeqFeature
...
from Bio.SeqFeature import SeqFeature
...


Here's a solution suggested on the BioPython mailing list (with some more code around):

from Bio import SeqFeature as sf

spos = sf.BeforePosition(20);
epos = sf.ExactPosition(50);
l1 = sf.FeatureLocation(spos, epos, strand=+1)
l2 = sf.FeatureLocation(epos+100, epos+200, strand=+1)
cl= sf.CompoundLocation([l1, l2]);
f1 = sf.SeqFeature(cl, strand=1, type="CDS", qualifiers={});