Hi all,
New to the OBITools suite, and am trying to use ecoPCR to develop bat-specific COI primers to modify into blocking primers for metabarcoding.
I have downloaded bat COI sequences and the fasta file looks like so:
> MH234219 organism=Hypsugo dolichodon; taxid=1897726; Hypsugo dolichodon voucher CBC02156 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
accctttatcttttatttggtgcttgagccggtatagtgggcaccgcattaagtctttta
attcgcgctgaattaggtcaaccaggagccctacttggagatgaccagatttataatgta
atcgtaactgctcatgcttttgtgataattttctttatagtcatacccattataattgga
ggcttcggaaattgacttgtcccattaataattggggctcctgatatagcattcccgcga
ataaataatataagcttttgacttcttcctccttctttcttactacttctggcatcatct
atagtagaagcgggcgcgggaacaggctgaacagtttatccccccttagcgggaaattta
gcccatgcaggagcctccgtggacttaacaattttttctctacacttagcaggtgtctca
tcaatcttaggagcaattaactttattactacaattattaatataaaacctcccgctctt
tcccaatatcaaacaccattatttgtatgatctgttctaatcacagctgtacttcttcta
ttatcccttcctgtattagctgctggtattacaatactattgacagaccgaaacctaaac
acgaccttttttgacccagctggcggaggagatcctattctataccaacatctattt
When I try to convert this file to ecoPCR format using the following command, it skips all the entries and produces an empty ecoPCR database. Without the --skip on error flag, it says the sequences do not have taxid's (which they do in the header). Anyone know why this is happening?? Thanks in advance.
> obiconvert --fasta --ecopcrdb-output=ECOPCROUTPUT / newsequences.fasta > 'my_bat_COI_database' --skip-on-error
Doing that still results in the same output.
Same thing is happening to me after applying obiaddtaxids using an NCBI taxdump. Did you ever get resolution on this?
I had a similar issue myself - I was trying to convert a "homemade" fasta file to ecopcr format and kept getting a 'sequence has no taxid' error. My fasta headers only have a sequence name followed by the taxid - they don't have any of the other variables shown above - e.g. >Species name (sampleXYZ) taxid=12345
I tried various different things - removing parentheses from sequence names, replacing spaces in sequence names with underscores, making sure my header whitespaces were the same format as an old obitools output fasta file and, lastly, making sure I had a semi-colon (;) after my taxid codes (i.e. >Species_name_sampleXYZ taxid=12345; ). It was putting in the semi-colon that finally got obiconvert working for me.
I'm not really sure why mingala's example file above isn't working, as there is already a semi-colon after taxid, but maybe editing the fasta so that that 'taxid' field immediately follows the accession number (instead of the 'organism' field) would help? I have a suspicion that obiconvert expects to see the taxid straight after the sequence name/accession, although I'm not really sure - I had a look at the .py scripts referenced in my error messages to try and figure out the formatting requirements, but my coding knowledge is pretty basic and I had trouble understanding them.
P.S. After I solved this issue, I got another error - 'Keyerror: 12345'. It seems that these errors are caused by using an outdated taxonomy dump, and arise when you have a recently created taxid in your fasta that isn't present in your tax dump. Downloading the most recent tax dump files from NCBI fixed this for me.