Question

Gubbins not working

0

Entering edit mode

5.8 years ago

saadleeshehreen ▴ 140

Hi,

I tried to run the following software. https://sanger-pathogens.github.io/gubbins/ Firstly, I aligned the files with progressiveMauve. It produced .xmfa files and I understood I had to convert the file in proper fasta. I then followed the instructions from https://sourceforge.net/p/mauve/mailman/message/35156599/ I chose the second

Use this script: https://github.com/kjolley/seq_scripts/blob/master/xmfa2fasta.pl

perl xmfa2fasta.pl --file inputfile.xmfa > outputfile.fasta

I got a fasta file. But while trying running gubbins with following commands then the following error message came

run_gubbins.py -o t.fasta
The following arguments are required: alignment_filename

run_gubbins.py  t.fasta
Error with the input FASTA file: It is in the wrong format so check its an alignment

How can I solve the problem?

gubbin mauve xmfa • 3.2k views

ADD COMMENT • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

Hello,

how does your fasta looks like? As the error message says there must be something wrong. But without showing us an example it will be quiet hard to figure out what's wrong with it.

fin swimmer

ADD REPLY • link 5.8 years ago by finswimmer 16k

0

Entering edit mode

>1
AACCGCGCCTACCGCATGGGCCGCGGGATCAAGGCCGGTCGCGTGTGGACCAACTGCTAC
CACCTGTACCCGGCCCACGCCGCGTTCGGCGGCTACAAGAAATCCGGCGTCGGTCGCGAG

But sometimes have NNNNNN ---------------- , etc

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

How big is the file? We may need to see more of it since the error might be in just one or two of the sequences.

Since gubbins expects alignments, it is probably testing to see if all your sequences are the same length, which may not be the case.

Run this command on your file to find out if they're all equal length:

awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen += length($0)}END{print seqlen}' file.fasta

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

The full error message is following:

-bash-4.2$ run_gubbins.py t.fasta
 Unexpected error: <class 'ValueError'>
 Error with the input FASTA file: It is in the wrong format so check its an alignment
 Each sequence must be the same length
There is a problem with your input fasta file so nothing can be done until you fix it
 -bash-4.2$ awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen += length($0)}END{print seqlen}' t.fasta
  >1
  7287934
   >2
 7287685

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

As I suspected. There is a problem with converting from XMFA to Fasta.

You can physically represent the data in the 2 different formats, but Fasta is 'dumb' in comparious, so it will just have all of the sequences that XMFA put out stuck together.

I would re-align with a tool that doesn't require these conversions. I've tried to do similar things in the past and gotten stuck along the road somewhere, though I can't recall exactly where now.

If your sequences are closely related try: https://omictools.com/multiple-genome-aligner-tool

If they aren't, its going to be difficult. Multiple sequence alignment of large sequences is something of an unsolved problem in bioinformatics.

If you can tell use what exactly you want to do/show, maybe there are more efficient ways.

Also, please post errors in full in future. That error tells you exactly what the problem is, so all the effort in this thread so far could have been avoided.

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

Trying to run gubbins after conversion. It behaves ok initially and generated some files. But, stopped and gave an error message

" Failed while running gubbins. Please ensure you have enough free memory"

It was running on the server and was just tried with 4 genomes. As while trying 2, it gave the error message that for analyzing, I have to give 3 or more genomes.

How did you ensure enough free memory in the server? Any opinion?

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

How much memory do you have available? I feel like I’ve seen that error before but I can’t remember what the solutions where, I can speak to the authors and ask them though.

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

I would probably use a tool other than progressiveMauve since XMFA format is a little peculiar anyway if I recall. I think it just gives you aligned blocks which you then would have to concatenate together - I might be wrong on this though as it's a while since I looked at it.

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

Please let me know the name of that tool, if u recall ..:)

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140

0

Entering edit mode

What are you aligning? Is it whole genomes? And how many?

ADD REPLY • link 5.8 years ago by Joe 21k

0

Entering edit mode

For my work, I need to align 100 whole genomes of different bacteria. But this time I just tested with two of them. I downloaded sequences from NCBI and aligned with progressiveMauve. Then, planning to run gubbins.

ADD REPLY • link 5.8 years ago by saadleeshehreen ▴ 140