Fasta with multiple sequences to alignment object which can by used to build a phylogenetic tree
1
0
Entering edit mode
4.9 years ago
mac03pat ▴ 30

I attempting to take a single fasta file with multiple sequences of variable length as input and output aligned sequenes that I can use to build a phylogenetic tree with biopython phylo.

Here's my file: https://drive.google.com/file/d/1QXXSJ2DJjJHz8K1WHuERFPQBTSvsWrcL/view?usp=sharing

Things I've tried:

from Bio import AlignIO
AlignIO.read(open('extracted_KS_with_taxa.fa'), 'fasta')
print(alignment.format('fasta'))

^ Doesn't work for sequences of unequal length

from Bio.Align.Applications import MuscleCommandline

cline = MuscleCommandline(input='extracted_KS_with_taxa.fa', out='aligned_KS.aln', clwstrict=True)
print(cline)

^ Didn't output a file

from Bio.Align.Applications import MuscleCommandline
muscle_cline = MuscleCommandline(input='extracted_KS_with_taxa.fa')
stdout, stderr = muscle_cline()
from StringIO import StringIO
from Bio import AlignIO
align = AlignIO.read(StringIO(stdout), 'fasta')
print(align)

^ Returned this error:

Traceback (most recent call last):
  File "C:\Users\mac03\AppData\Local\Programs\Python\Python37\MBSProject\align_fasta.py", line 20, in <module>
    stdout, stderr = muscle_cline()
  File "C:\Users\mac03\AppData\Local\Programs\Python\Python37\lib\site-packages\Bio\Application\__init__.py", line 527, in __call__
    stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 1 from 'muscle -in extracted_KS_with_taxa.fa', message "'muscle' is not recognized as an internal or external command,"
alignment biopython phylo clustal fasta • 3.1k views
ADD COMMENT
0
Entering edit mode
4.9 years ago
Joe 21k

In your first case, I think the problem here is that you’re trying to use AlignIO to read a fasta of sequences, not an alignment (if I understand your data correctly).

AlignIO is specifically for reading formats of pre-aligned data, whereas SeqIO is what you need for reading basic sequence data.

Secondly, print(cline) doesn’t do anything, because thats just the commandline itself, not the result of the alignment. You first need to run muscle, which is what BioPython is doing (you also need it installed).

The fact that you don’t have muscle installed already, is why your last command is failing, because Biopython is shell-ing out to run muscle on the commandline, but doesn’t recognise the command, because there’s no corresponding installed binary for muscle.

I suggest you look closely at the BioPython Tutorial, as there are a good many things you’ve got mixed up here.

ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6