Change sequence in SEQIO
1
0
Entering edit mode
6.8 years ago

Hi Biostars, I think I have a very basic question, for which however could not find an answer in web.

I need to read fasta using SeqIO, change sequence for every record, and then write new records to file using again SeqIO.

What I do is:

from Bio import SeqIO
import re
sequences=[]

for record in SeqIO.parse("test.fasta", "fasta") :
    record.seq = re.sub('[^GATC]', "", str(record.seq).upper())
    sequences.append(record)

SeqIO.write(sequences,"output.fasta","fasta")

I get multiple errors running this code:

Traceback (most recent call last):
  File "script.py", line 15, in <module>
  SeqIO.write(sequences,"output.fasta","fasta")
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 481, in write
  count = writer_class(fp).write_file(sequences)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
count = self.write_records(records)
File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 194, in    write_records
    self.write_record(record)
  File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 202, in   write_record
    data = self._get_seq_string(record)  # Catches sequence being None
   File "/home/ghovhannisyan/Software/anaconda2/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.py", line 100, in _get_seq_string
    % record.id)
TypeError: SeqRecord (id=TCONS_00000001) has an invalid sequence.

I can bypass SeqIO.write method by parsing list and creating file, but I am wondering what is the problem here. Thanks

SeqIO Biopython • 5.1k views
ADD COMMENT
0
Entering edit mode

In order to provide closure for this thread it may be best to include @Peter's solution as an answer below (with the link attribution above) and then accept that answer.

ADD REPLY
2
Entering edit mode
6.8 years ago

Answered in StackOverflow by peterjc

"Biopython's SeqIO expects the SeqRecord object's .seq to be a Seq object (or similar), not a plain string. Try:

seq_record.seq = Seq(re.sub('[^GATC]',"",str(sequence).upper()))

For FASTA output there is no need to set the sequence's alphabet."

ADD COMMENT

Login before adding your answer.

Traffic: 1846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6