I'm trying to make a blastdb out of fungal ITS sequences from UNITE (https://unite.ut.ee/index.php). I downloaded the general fasta file and edited the fasta headers so they look like this:
GQ280590|Calonectria_leucothoës
DQ675574|Epichloë_sibirici
e.t.c.
However, when I run this makeblastdb command:
./makeblastdb -in unite_fungi.fasta -out UNITE_ITS.fasta -dbtype nucl -parse_seqids
I am getting the following error for all entries:
Error: (803.7) [makeblastdb] Blast-def-line-set.E.seqid.E.local.str
Bad char [0xAB] in string at byte 46.
Initially I thought it could be something to do with tabs/spaces at the end of the header, so I tried removing them using (please correct me if I'm wrong):
sed 's/[[:blank:]]*$//'
However, this did not work either.
Does anyone know why this might be happening?
Thanks in advance.
Wonder if the issue is using non-US character set.
could you please elaborate?
Do you know what unicode character set you are using on this machine? The error you have posted above seems to be referring to this character.
No, but that link helped solve my problem! Turns out it was the 'ë' characters in the fasta headers that were causing the problem. Thanks for the help!