Biostar Beta. Not for public use.
Roary MSG: Got a sequence without letters. Could not guess alphabet
Entering edit mode
12 months ago
c.e.chong • 10


I am trying to run Roary on a set of skin isolate sequences to create a pan genome to use in further comparative analyses.

I have QC'd, assembled (Unicycler) and annotated (Prokka) my sequences which are all Staphylococcus capitis. To confirm this I pulled out the 16S rRNA, rpoB and gap genes from the gbk files and run them on BLAST.

I use the command: roary -p 12 -f epi_95_mafft -e --mafft -n -v -r -i 95 ~/comp_genomics/prokka_annotation/epi_annotation/epi_gffs/*.gffto run Roary.

The command appears to run fine, but this error message pops up at the end:

2019/02/25 14:31:10 Running command: mafft --auto --quiet pan_genome_sequences/group_5608.fa > pan_genome_sequences/group_5608.fa.aln All arguments to easy_init should be either an integer log level or a hash reference. at /usr/local/share/sanger-pathogens-Roary-459fd8e/lib/Bio/Roary/CommandLine/ line 22    --------------------- WARNING ---------------------                                                              MSG: Got a sequence without letters. Could not guess alphabet

After searching this error message I read that this can mean that the sequences are not very closely related, so to look into this I ran mash. I compared each of my sequences with the capitis type strain from NCBI. The output for this was a mash distance of around 0.02, p-value 0 and matching hash score of ~200-600/1000 for all sequences.

If anyone knows of any reason why this error message pops up or a way to fix it I'd be grateful for your help!

Thanks in advance!

Entering edit mode

Just a guess, but maybe your assembly includes some very short sequence (like just a few bp long) for which it's not possible to determine whether it's dna or protein


Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1