Making fasta file for clustal
3
0
Entering edit mode
8.9 years ago
burnsro ▴ 20

I have two fasta files of DNA sequences for upstream promoter regions for two species, and I would like to align them in clustalW

I read on the clustalW manual pages in ubuntu that "all sequences must be in 1 file, one after another"

I'm trying to understand if that means I need to have each orthologous pair, one after the other, merged into one fasta file for my two species. And does anyone know how this can be achieved?

Fasta Clustal • 4.0k views
ADD COMMENT
0
Entering edit mode
8.9 years ago
Vivek ★ 2.7k

If you are planning to do a pairwise alignment, you need to have a single file for each query and target fasta sequence, kind of like this. You could read the existing files using your favorite fasta parsing modules in either Bio-Perl or Bio-Python and write each sequence into a new file.

>Query1
TGCCTACTGAGCTGAAACAGT
>Target1
CAGTAACCATGACCTCCCGCAGGACAGCGGAGCC

Here's a thread on splitting fasta files: How To Split A Multiple Fasta

ADD COMMENT
0
Entering edit mode
8.9 years ago
venu 7.1k

all sequences must be in 1 file, one after another

This means just keep all the fasta sequences in one file, nothing more. When clustalW asks for the file name containing fasta sequences give the file name in which all the fasta sequences are present. You can set all other parameters like MSA or pairwise alignment parameters before the alignment begins.

ADD COMMENT
0
Entering edit mode
8.9 years ago
Charles Plessy ★ 2.9k

The full quote is:

SEQUENCE INPUT: all sequences must be in 1 file, one after another.
7 formats are automatically recognised: NBRF-PIR, EMBL-SWISSPROT,
Pearson (Fasta), Clustal (*.aln), GCG-MSF (Pileup), GCG9-RSF and GDE flat file.
All non-alphabetic characters (spaces, digits, punctuation marks) are ignored
except "-" which is used to indicate a GAP ("." in MSF-RSF).

This gives you a list of possible sequence formats. Most of them are well documented, in particular the FASTA format. You can also find more examples in the EMBOSS documentation.

This said, as suggested by the other answers, the FASTA format may be the easiest for you.

ADD COMMENT

Login before adding your answer.

Traffic: 2802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6