I would like to store each fasta sequence as a string in a format which will allow me to parse the strings with a function that calculates a measure for the entire sequence (ex a function which calculates average protein weight). I would than like to plot some of the data ?
I'd do this with a generator, if the sequences do not have to be kept, but could be processed on the fly.
Using Alex's code as example that would stuff the whole block into a function, e.g.
parseFasta()
and then replace therecords[header] = sequence
with ayield sequence
. The whole thing could then be used like this:Edit: You can get that very same functionality implemented in https://github.com/cschu/ktio. Or, I think
pip install ktio
and the nfrom ktio.ktio import readFasta
should also work. (Sorry, badly documented!)