Question

Multiple sequence alignment

0

Entering edit mode

8.0 years ago

saravanakumar992 ▴ 80

I have set of 120 sequence files. Now i need to make the alignment with wild type protein sequence for all other 120 sequence files and and then result file must come all together in a single file.

kindly please help me.

sequence alignment blast • 4.0k views

ADD COMMENT • link updated 7.9 years ago by Suzanne ▴ 100 • written 8.0 years ago by saravanakumar992 ▴ 80

1

Entering edit mode

You need to clarify if all sequence files in question are protein.
Sounds like you sequence files may be DNA. If so you will need to do some additional work (translate) before any of following packages (mentioned in various answers) can be used.

ADD REPLY • link 8.0 years ago by GenoMax 141k

score 1 · Answer 1 · 2016-04-18

1

Entering edit mode

8.0 years ago

gearoid ▴ 200

If you have 120 FASTA files with one sequence each, and another with your wild type (and you're using Linux/OS X), first use cat to concatenate all the sequences into one file, e.g.

cat seq1.fasta seq2.fasta ... seqN.fasta > all_my_sequences.fa

or

cat *.fasta > all_my_sequences.fa

Then go to the EBI Clustal Omega server and upload all_my_sequences.fa, or paste the contents of the file in the box. Change the output format to whatever you want (Clustal format is probably better for humans and Pearson/FASTA for computers), then just click submit.

ADD COMMENT • link 8.0 years ago by gearoid ▴ 200

1

Entering edit mode

I am afraid, this line

cat *.fa > all_my_sequences.fa

is dangerous. It's better to do something like

cat *.fa > all_my_sequences.txt

or

cat *.fa > all_my_sequences.fasta

And I like to use Mafft for the multiple alignment:

http://mafft.cbrc.jp/alignment/software/

ADD REPLY • link 8.0 years ago by natasha.sernova ★ 4.0k

1

Entering edit mode

Can you elaborate on why it's dangerous? I guess you can only run that command once, is that what you mean?

I like MAFFT, but in this case I would probably want to use MAFFT L-INS-i or G-INS-i, rather than the default MAFFT parameters, and I just tried to give the simplest option I could think of (no software installation or changing parameters on the web server).

T-Coffee might also be a good option for this number of sequences.

ADD REPLY • link 8.0 years ago by gearoid ▴ 200

1

Entering edit mode

I've had this as a mistake several times, cat will use all *.fa files, including the output-file, that is why output-file

extension should be different.

The full mafft comand is a long string with different parameters. It allows many iterations, this is useful sometimes.

It would look like:

mafft-7.215-with-extensions/bin/mafft --localpair --maxiterate 1000 --ep 0.123 --legacygappenalty initial_file.fasta > align.fa

ADD REPLY • link 8.0 years ago by natasha.sernova ★ 4.0k

1

Entering edit mode

I think it works as long as the file that you're writing to doesn't exist already, but you're right, it's sloppy--I updated my answer.

The long string of parameters is why I didn't recommend MAFFT for this question, I was just trying to keep it simple. It's a great option, though.

ADD REPLY • link 8.0 years ago by gearoid ▴ 200

1

Entering edit mode

Or perhaps:

find /the/dir/where/the/seqs/are/ -maxdepth 1 -type f -iname "*.fa" | xargs cat | muscle -in - -out aligned.fa

ADD REPLY • link 8.0 years ago by 5heikki 11k

score 1 · Answer 2 · 2016-04-20

1

Entering edit mode

8.0 years ago

kapil.joshi036 ▴ 80

if your working on windows you can try the Bioedit tool

ADD COMMENT • link 8.0 years ago by kapil.joshi036 ▴ 80

score 0 · Answer 3 · 2016-04-18

0

Entering edit mode

8.0 years ago

Benn 8.3k

Did you try clustalW? http://www.clustal.org/clustal2/

ADD COMMENT • link 8.0 years ago by Benn 8.3k

score 0 · Answer 4 · 2016-04-18

0

Entering edit mode

8.0 years ago

agata88 ▴ 870

For protein alignments they recommend Clustal Omega.

ADD COMMENT • link 8.0 years ago by agata88 ▴ 870

0

Entering edit mode

http://www.clustal.org/omega/

ADD REPLY • link 8.0 years ago by Benn 8.3k

score 0 · Answer 5 · 2016-05-13

0

Entering edit mode

7.9 years ago

Suzanne ▴ 100

Jalview www.jalview.org.uk) is versatile free tool for MSA which can run all the main MSA algorithms. Look at their YouTube Jalview Online Training videos for more information. It also has integrated structure, annotation, PCA and tree windows.

ADD COMMENT • link 7.9 years ago by Suzanne ▴ 100