Can't locate the output files created by clustalW2 interactive command line on windows7
2
0
Entering edit mode
9.4 years ago
Bara'a ▴ 270

Hi all ...

I'm trying to use the command line interactive version of clustalW2 to do some alignments (I'm using windows 7), but I couldn't locate the files that are supposed to hold the output alignments and the guide tree despite of being informed that they were created and showing the alignments on screen!!

I have been searching the internet for days but couldn't find any comprehensive tutorials other than those describing how to configure clustalW2 on Linux and Mac operating systems :(

Alternatively ; I tried to to use it via windows cmd after its environmental variables were defined to the system but it gave me this message :

'clustalW2' is not recognized as an internal or external, operable program or batch file.

What am I doing wrong here ?

How can I configure clustalW2 on windows 7 ?! and where can I find the files created by the interactive mode ?!

Would any one please guide me to the right direction ?!

I would be very grateful if you helped me to solve this issue on both ways.

Thanks in advance

clustalW2 windows7 • 7.1k views
ADD COMMENT
3
Entering edit mode
9.4 years ago
hpmcwill ★ 1.2k

You may find this easier using the graphical version of ClustalW 2: ClustalX2 (see http://clustal.org/clustal2/), rather than the command-line ClustalW 2, FWIW the download for MS Windows is: clustalx-2.1-win.msi. After installation you will have a program menu item for ClustalX2, which you can run. You then load the sequences to be aligned ("File" -> "Load Sequences"), and perform the alignment ("Alignment" -> "Do Complete Alignment"). By default the guide tree (.dnd) and alignment (.aln) files will be generated in the same directory as the file of input sequences, but you can change this when prompted if you want them to go somewhere else.

If you have to use the command-line version of ClustalW 2, then you will need to know the path to your input sequences and since the ClustalW 2 executables are not automatically added to the PATH, where ClustalW 2 was installed. A typical command line session would look something like:

> cd "C:\Users\username\Documents\My Documents\Data"
> "c:\Program Files (x86)\ClustalW2\clustalw2.exe" /INFILE=arf_seq.faa /ALIGN

Assuming that the C:\Users\username\Documents\My Documents\Data directory contains the input sequence data to be aligned, in this case the contents of the arf_seq.faa file of fasta formatted sequences. By default the output files are generated in the directory containing the input data, and are named after the input file (in this case arf_seq.dnd and arf_seq.aln).

The same principle applies if you are using the interactive mode of ClustalW 2, except that you will need to know the complete path to the input sequence data file (e.g. C:\Users\username\Documents\My Documents\Data\arf_seq.faa) in order to load the sequences. Again by default the output files will be generated in the directory containing the input file and will be named after the file. However this can be changed by using the prompts during the alignment process. Note that relative paths should be avoided if using the program menu item to start ClustalW 2 in interactive mode since these will be relative to the "Start in" directory specified in the shortcut used (typically the installation directory).

Please note that ClustalW 2 has largely be superseded by Clustal Omega, for most purposes the use of Clustal Omega is recommended.

For further assistance with the use of the Clustal series of multiple sequence alignment programs I suggest you contact the authors, see http://clustal.org/ for details.

ADD COMMENT
0
Entering edit mode

@hpmcwill ... Thank you for your comprehensive reply , I truly appreciate it .

I already managed to fix this issue , it was something related to environment variables .

But unfortunately , I have to work with clustalW2 so I can maintain alignments from some biopython script :(

Thus ; I have one question left to ask : Why does clustalW2 choke when handling large sequence files (up to 300 Mb) ??

Such files tend to terminate the running session of clustalW2 !!

How can I overcome this problem ?!

ADD REPLY
1
Entering edit mode

For large inputs ClustalW 2 can require very large amounts of memory. Since the distributed ClustalW 2 binary for MS Windows is 32-bit it can only use up to 2GB of memory before being terminated. So I am guessing your problem is likely to be memory usage.

So you have a couple of options:

A. Recompile ClustalW 2 to support more memory.

According to "Memory Limits for Windows and Windows Server Releases" the 2GB limit can be increased to 4GB for a 32-bit process by linking with the /LARGEADDRESSAWARE flag enabled. Or you could try building a 64-bit binary.

If you do not have a MS Windows compiler installed, you might want to look at Visual Studio Community 2013. If you have problems try contacting the authors (see http://clustal.org/).

B. Reduce the size and complexity of the input.

  1. Remove duplicate sequences (see How To Remove The Same Sequences In The Fasta Files?)
  2. Ensure all sequences are of similar lengths. Avoid mixtures of short and long sequences.
  3. Screen for repeats.
  4. Check that sequences share some level of similarity (i.e. are related)

You might also want to consider migrating to Clustal Omega since BioPython does include support for the newer method (Bio.Align.Applications.ClustalOmegaCommandline), and Clustal Omega is much more memory efficient.

ADD REPLY
0
Entering edit mode

Again , thank you for your great detailed answer @hpmcwill.

I will try these options and see how it works, but I think I'm more likely to consider Clustal Omega since it's supported by biopython.

Thanks again, I'm so grateful for your help.

ADD REPLY
0
Entering edit mode

So far; I've tried the latter option and decided to upgrade to Clustal Omega but I've encountered many problems installing it.

First, its dependencies (argtable2) was quite complicated to install - at least to me - !!

The command line refuses to recognize the nmake so it can proceed with the argtable2 installation thing.

I used this reference http://sourceforge.net/projects/argtable/files/, from there I explored the option of adding an environment variable to where nmake is located, but this didn't work.

I hesitated to install the visual studio - some online threads suggested this as a another solution - for two reasons: I do have visual studio 2010 installed and I'm worried about the risks that may threat already built projects on different programming languages platforms when upgrading an existing visual studio package to obtain the required compiler.

My question is: does upgrading or re-installing visual studio affects in anyway the previously created projects and their system configurations?

What are other possibilities I can explore to overcome this problem ?!

Second; how come the dependencies use the nmake "utility" while the main executable file Clustal Omega uses make?! Isn't (make) for Linux systems ?! That's really confusing -_- O.o

I'm using Windows 7 64bit ... and I'm really disappointed to see Linux OS preferred somehow over Windows OS when it comes to bioinformatics tools :(

BTW ... why is that?!!!!

ADD REPLY
1
Entering edit mode

First off... have you tried using the MS Windows binary as suggested by the authors? Note that the "INSTALL.txt" file is the version from the UNIX distribution (I am guessing that this was meant to be replaced), and all you should need to do is unpack the distribution in an appropriate directory and use the clustalo.exe by either specifying the full path or adding the directory to the PATH.

For help with building Clustal for MS Windows you will have to contact the Clustal authors as detailed in the documentation and on the Clustal website. As far as I can tell the source distributions do not contain support for MS Windows based builds, suggesting a separate build process was used by the authors to create the pre-compiled binaries provided on the website, since I have no way of knowing what this was you will have to contact them for details.

You might find using a UNIX style environment for MS Windows such as Cygwin, MinGW or Mingw-w64 makes things easier, and current versions of Cygwin and Mingw-w64 support a 64-bit tool chain so they may be an option for building from source to get a 64-bit binary.

As for the use of Linux in Bioinformatics, this is a product of a number of factors which have been discussed at length in various posts on Biostars and so is not really worth rehashing here. Suffice it to say that the use of open source and free tools (e.g. Perl, Python, GNU, etc.) and thus operating systems (e.g. Linux) is core to modern bioinformatics, and has been since this mid 1990s, and thus makes using MS Windows a more difficult option.

ADD REPLY
0
Entering edit mode

I have contacted the Clustal Omega authors , and they directed me to use this site as a useful reference.

I followed the instructions but it doesn't seem that Clustal Omega was configured correctly on my machine :(

I'm not familiar with Linux OS, but I noticed while reading the on screen Log that there was a problem recognizing the C compiler!!

Also , I noticed that no file named libcc_sjlj-1.dll was created ... instead I had this file libgcc_s_sjlj-1.dll!!

I tried to run the binary (clustalo.exe) after finishing those steps , but a cmd screen flashes and disappears !!

On the other hand , I tried to define an environment variable to where the 64bit clustalo binary exists and again I had no luck to run it , I'm really frustrated.

Am I missing something here?

ADD REPLY
2
Entering edit mode
9.3 years ago
clustalw ▴ 20

The next Clustal Omega release (hopefully early next year) will contain updated installation information for Windows users. This version should also be available as a pre-compiled 64-bit Windows executable.

ADD COMMENT

Login before adding your answer.

Traffic: 2239 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6