write multiple record fasta from pyfaidx output
1
0
Entering edit mode
8.3 years ago
mosquitoes • 0

Hello, I'm trying to extract several sequences of DNA from a fasta using the chromosome and coordinates. I then want to write these sequences to a fasta file. I want to do this exclusively in python.

For one sequence, I am able to extract the sequence:

from Bio import SeqIO
from pyfaidx import Fasta

genes = Fasta('Genome.fasta')

f = open('chr02_18s', 'w')

seqFile = genes['chr02'][146062:148216]
f.write(str(seqFile))

This gets the correct sequence, but it only writes the nucleotide sequence to the file, it does not include the fasta header.

If I use:

genes = Fasta('/Users/eflannery/Dropbox/Genomes/PlasmoDB-25_PvivaxSal1_Genome.fasta')
f = open('chr02_18s', 'w')
seqFile = genes['Pv_Sal1_chr02'][146062:148216]
# f.write(str(seqFile))
SeqIO.write(seqFile, f, "fasta")

I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-204-1a62c837614c> in <module>()
      9 seqFile = genes['Pv_Sal1_chr02'][146062:148216]
     10 # f.write(str(seqFile))
---> 11 SeqIO.write(seqFile, f, "fasta")
     12 # slice
     13 # output_handle.write(handle)

//anaconda/lib/python2.7/site-packages/Bio/SeqIO/__init__.pyc in write(sequences, handle, format)
    470         if format in _FormatToWriter:
    471             writer_class = _FormatToWriter[format]
--> 472             count = writer_class(fp).write_file(sequences)
    473         elif format in AlignIO._FormatToWriter:
    474             # Try and turn all the records into a single alignment,

//anaconda/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.pyc in write_file(self, records)
    209         """
    210         self.write_header()
--> 211         count = self.write_records(records)
    212         self.write_footer()
    213         return count

//anaconda/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.pyc in write_records(self, records)
    194         count = 0
    195         for record in records:
--> 196             self.write_record(record)
    197             count += 1
    198         # Mark as true, even if there where no records

//anaconda/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.pyc in write_record(self, record)
    188             title = self.clean(self.record2title(record))
    189         else:
--> 190             id = self.clean(record.id)
    191             description = self.clean(record.description)
    192             if description and description.split(None, 1)[0] == id:

AttributeError: 'Sequence' object has no attribute 'id'

I obviously don't understand what kind of record is created with the fasta function in pyfaidx, but I can't find this info or how to write it to a fasta file anywhere.

Thanks for the help

pyfaidx next-gen fasta • 3.4k views
ADD COMMENT
2
Entering edit mode
8.3 years ago

You're very close. The 'Sequence' object has these attributes:

Sequence:
  .name
  .seq
  .start
  .end
  .comp

When you "call" the object the Sequence.__rep__ method is called, which looks like this:

def __repr__(self):
    return '\n'.join([''.join(['>', self.longname]), self.seq])

(Sequence.longname gets the full name as represented in the FASTA file - which is not always the representation in the FASTA index file)

When you pass the Sequence object to the str function you call this method:

def __str__(self):
    return self.seq

So, you can see that str(Sequence) only gives back the sequence, and not the name. For what you want to accomplish I would recommend:

ADD COMMENT
0
Entering edit mode

Thank-you!!

ADD REPLY

Login before adding your answer.

Traffic: 1670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6