Multiple sequence alignment
5
0
Entering edit mode
8.0 years ago

I have set of 120 sequence files. Now i need to make the alignment with wild type protein sequence for all other 120 sequence files and and then result file must come all together in a single file.

kindly please help me.

sequence alignment blast • 4.0k views
ADD COMMENT
1
Entering edit mode

You need to clarify if all sequence files in question are protein.
Sounds like you sequence files may be DNA. If so you will need to do some additional work (translate) before any of following packages (mentioned in various answers) can be used.

ADD REPLY
1
Entering edit mode
8.0 years ago
gearoid ▴ 200

If you have 120 FASTA files with one sequence each, and another with your wild type (and you're using Linux/OS X), first use cat to concatenate all the sequences into one file, e.g.

cat seq1.fasta seq2.fasta ... seqN.fasta > all_my_sequences.fa

or

cat *.fasta > all_my_sequences.fa

Then go to the EBI Clustal Omega server and upload all_my_sequences.fa, or paste the contents of the file in the box. Change the output format to whatever you want (Clustal format is probably better for humans and Pearson/FASTA for computers), then just click submit.

ADD COMMENT
1
Entering edit mode

I am afraid, this line

cat *.fa > all_my_sequences.fa

is dangerous. It's better to do something like

cat *.fa > all_my_sequences.txt

or

cat *.fa > all_my_sequences.fasta

And I like to use Mafft for the multiple alignment:

http://mafft.cbrc.jp/alignment/software/

ADD REPLY
1
Entering edit mode

Can you elaborate on why it's dangerous? I guess you can only run that command once, is that what you mean?

I like MAFFT, but in this case I would probably want to use MAFFT L-INS-i or G-INS-i, rather than the default MAFFT parameters, and I just tried to give the simplest option I could think of (no software installation or changing parameters on the web server).

T-Coffee might also be a good option for this number of sequences.

ADD REPLY
1
Entering edit mode

I've had this as a mistake several times, cat will use all *.fa files, including the output-file, that is why output-file

extension should be different.

The full mafft comand is a long string with different parameters. It allows many iterations, this is useful sometimes.

It would look like:

mafft-7.215-with-extensions/bin/mafft --localpair --maxiterate 1000 --ep 0.123 --legacygappenalty initial_file.fasta > align.fa

ADD REPLY
1
Entering edit mode

I think it works as long as the file that you're writing to doesn't exist already, but you're right, it's sloppy--I updated my answer.

The long string of parameters is why I didn't recommend MAFFT for this question, I was just trying to keep it simple. It's a great option, though.

ADD REPLY
1
Entering edit mode

Or perhaps:

find /the/dir/where/the/seqs/are/ -maxdepth 1 -type f -iname "*.fa" | xargs cat | muscle -in - -out aligned.fa
ADD REPLY
1
Entering edit mode
8.0 years ago

if your working on windows you can try the Bioedit tool

ADD COMMENT
0
Entering edit mode
8.0 years ago
Benn 8.3k

Did you try clustalW? http://www.clustal.org/clustal2/

ADD COMMENT
0
Entering edit mode
8.0 years ago
agata88 ▴ 870

For protein alignments they recommend Clustal Omega.

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
7.9 years ago
Suzanne ▴ 100

Jalview www.jalview.org.uk) is versatile free tool for MSA which can run all the main MSA algorithms. Look at their YouTube Jalview Online Training videos for more information. It also has integrated structure, annotation, PCA and tree windows.

ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6