Get the dump taxonomy

Question

Obiconvert produces empty EcoPCR database...why?

0

Entering edit mode

5.8 years ago

mingala • 0

Hi all,

New to the OBITools suite, and am trying to use ecoPCR to develop bat-specific COI primers to modify into blocking primers for metabarcoding.

I have downloaded bat COI sequences and the fasta file looks like so:

> MH234219 organism=Hypsugo dolichodon; taxid=1897726; Hypsugo dolichodon voucher CBC02156 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
accctttatcttttatttggtgcttgagccggtatagtgggcaccgcattaagtctttta
attcgcgctgaattaggtcaaccaggagccctacttggagatgaccagatttataatgta
atcgtaactgctcatgcttttgtgataattttctttatagtcatacccattataattgga
ggcttcggaaattgacttgtcccattaataattggggctcctgatatagcattcccgcga
ataaataatataagcttttgacttcttcctccttctttcttactacttctggcatcatct
atagtagaagcgggcgcgggaacaggctgaacagtttatccccccttagcgggaaattta
gcccatgcaggagcctccgtggacttaacaattttttctctacacttagcaggtgtctca
tcaatcttaggagcaattaactttattactacaattattaatataaaacctcccgctctt
tcccaatatcaaacaccattatttgtatgatctgttctaatcacagctgtacttcttcta
ttatcccttcctgtattagctgctggtattacaatactattgacagaccgaaacctaaac
acgaccttttttgacccagctggcggaggagatcctattctataccaacatctattt

When I try to convert this file to ecoPCR format using the following command, it skips all the entries and produces an empty ecoPCR database. Without the --skip on error flag, it says the sequences do not have taxid's (which they do in the header). Anyone know why this is happening?? Thanks in advance.

> obiconvert --fasta --ecopcrdb-output=ECOPCROUTPUT  / newsequences.fasta > 'my_bat_COI_database' --skip-on-error

obitools obiconvert fasta ecopcr • 4.1k views

ADD COMMENT • link updated 5.0 years ago by steffie11 ▴ 20 • written 5.8 years ago by mingala • 0

0

Entering edit mode

Doing that still results in the same output.

ADD REPLY • link 5.8 years ago by mingala • 0

0

Entering edit mode

Same thing is happening to me after applying obiaddtaxids using an NCBI taxdump. Did you ever get resolution on this?

ADD REPLY • link 5.3 years ago by patrick_freeman ▴ 20

0

Entering edit mode

I had a similar issue myself - I was trying to convert a "homemade" fasta file to ecopcr format and kept getting a 'sequence has no taxid' error. My fasta headers only have a sequence name followed by the taxid - they don't have any of the other variables shown above - e.g. >Species name (sampleXYZ) taxid=12345

I tried various different things - removing parentheses from sequence names, replacing spaces in sequence names with underscores, making sure my header whitespaces were the same format as an old obitools output fasta file and, lastly, making sure I had a semi-colon (;) after my taxid codes (i.e. >Species_name_sampleXYZ taxid=12345; ). It was putting in the semi-colon that finally got obiconvert working for me.

I'm not really sure why mingala's example file above isn't working, as there is already a semi-colon after taxid, but maybe editing the fasta so that that 'taxid' field immediately follows the accession number (instead of the 'organism' field) would help? I have a suspicion that obiconvert expects to see the taxid straight after the sequence name/accession, although I'm not really sure - I had a look at the .py scripts referenced in my error messages to try and figure out the formatting requirements, but my coding knowledge is pretty basic and I had trouble understanding them.

ADD REPLY • link 5.2 years ago by klrdna • 0

0

Entering edit mode

P.S. After I solved this issue, I got another error - 'Keyerror: 12345'. It seems that these errors are caused by using an outdated taxonomy dump, and arise when you have a recently created taxid in your fasta that isn't present in your tax dump. Downloading the most recent tax dump files from NCBI fixed this for me.

ADD REPLY • link 5.2 years ago by klrdna • 0

score 2 · Answer 1 · 2019-05-20

Hi Mingala, you need to indicate the taxonomic database by using -d during obiconvert.

Get the dump taxonomy

mkdir TAXO
cd TAXO
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxcf taxdump.tar.gz

Format the taxonomy for OBITools

obitaxonomy -t TAXO -d TAXO

Attribute taxonomic ids to the sequences

obiaddtaxids -d ~/TAXO  ~/sequences.fasta > sequences.taxid.fasta

Formatting the database

obiconvert -d ./TAXO --fasta --ecopcrdb-output=sequencesdb sequences.taxid.fasta

I downloaded my green algae barcodes from BOLD, some of the BOLD sequences have not been published to NCBI. Therefore they have their taxids to higher taxonomic levels.

score 0 · Answer 2 · 2018-07-19

0

Entering edit mode

5.8 years ago

h.mon 35k

ECOPCROUTPUT  / newsequences.fasta

Remove the / from your command-line:

ECOPCROUTPUT newsequences.fasta

ADD COMMENT • link 5.3 years ago by h.mon 35k