I attempting to take a single fasta file with multiple sequences of variable length as input and output aligned sequenes that I can use to build a phylogenetic tree with biopython phylo.
Here's my file: https://drive.google.com/file/d/1QXXSJ2DJjJHz8K1WHuERFPQBTSvsWrcL/view?usp=sharing
Things I've tried:
from Bio import AlignIO
AlignIO.read(open('extracted_KS_with_taxa.fa'), 'fasta')
print(alignment.format('fasta'))
^ Doesn't work for sequences of unequal length
from Bio.Align.Applications import MuscleCommandline
cline = MuscleCommandline(input='extracted_KS_with_taxa.fa', out='aligned_KS.aln', clwstrict=True)
print(cline)
^ Didn't output a file
from Bio.Align.Applications import MuscleCommandline
muscle_cline = MuscleCommandline(input='extracted_KS_with_taxa.fa')
stdout, stderr = muscle_cline()
from StringIO import StringIO
from Bio import AlignIO
align = AlignIO.read(StringIO(stdout), 'fasta')
print(align)
^ Returned this error:
Traceback (most recent call last):
File "C:\Users\mac03\AppData\Local\Programs\Python\Python37\MBSProject\align_fasta.py", line 20, in <module>
stdout, stderr = muscle_cline()
File "C:\Users\mac03\AppData\Local\Programs\Python\Python37\lib\site-packages\Bio\Application\__init__.py", line 527, in __call__
stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'muscle -in extracted_KS_with_taxa.fa', message "'muscle' is not recognized as an internal or external command,"