Biostar Beta. Not for public use.
Roary MSG: Got a sequence without letters. Could not guess alphabet
0
Entering edit mode
12 months ago
c.e.chong • 10

Hi,

I am trying to run Roary on a set of skin isolate sequences to create a pan genome to use in further comparative analyses.

I have QC'd, assembled (Unicycler) and annotated (Prokka) my sequences which are all Staphylococcus capitis. To confirm this I pulled out the 16S rRNA, rpoB and gap genes from the gbk files and run them on BLAST.

I use the command: roary -p 12 -f epi_95_mafft -e --mafft -n -v -r -i 95 ~/comp_genomics/prokka_annotation/epi_annotation/epi_gffs/*.gffto run Roary.

The command appears to run fine, but this error message pops up at the end:

2019/02/25 14:31:10 Running command: mafft --auto --quiet pan_genome_sequences/group_5608.fa > pan_genome_sequences/group_5608.fa.aln All arguments to easy_init should be either an integer log level or a hash reference. at /usr/local/share/sanger-pathogens-Roary-459fd8e/lib/Bio/Roary/CommandLine/Common.pm line 22    --------------------- WARNING ---------------------                                                              MSG: Got a sequence without letters. Could not guess alphabet

After searching this error message I read that this can mean that the sequences are not very closely related, so to look into this I ran mash. I compared each of my sequences with the capitis type strain from NCBI. The output for this was a mash distance of around 0.02, p-value 0 and matching hash score of ~200-600/1000 for all sequences.

If anyone knows of any reason why this error message pops up or a way to fix it I'd be grateful for your help!

Thanks in advance!

ADD COMMENTlink
1
Entering edit mode

Just a guess, but maybe your assembly includes some very short sequence (like just a few bp long) for which it's not possible to determine whether it's dna or protein

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1