Add a genome to a kraken library
1
0
Entering edit mode
11 months ago

Hi, if I understood well, a kraken library only contains human, bacterial and viruses taxonomy.

I noticed that it was possible to add another genome, as follows:

kraken-build --add-to-library chr1.fa --db $DBNAME

So I downloaded a genome, and write the following line:

kraken2-build --add-to-library Culicoides_sonorensis.Cson1.dna_rm.toplevel.fa --db Kraken2_Standard_Fev2019

Here the output error message:

scan_fasta_file.pl: unable to determine taxonomy ID for sequence scaffold40

Indeed, there isn't any taxonomy information in the fasta file (header example :

>scaffold40 dna:supercontig supercontig:Cson1:scaffold40:1:766034:1 REF)

So, how Kraken does to retrieve a taxonomy information from a fasta file? Is there a specific fasta format to download?

kraken • 942 views
ADD COMMENT
0
Entering edit mode
11 months ago

Check the manual.

>sequence16|kraken:taxid|32630  Adapter sequence
CAAGCAGAAGACGGCATACGAGATCTTCGAGTGACTGGAGTTCCTTGGCACCCGAGAATTCCA
ADD COMMENT
0
Entering edit mode

I transformed all my sequences ids according to the manual. Krakenbuild accepted them ("Culicoides_sonorensis.fa" was added to the kraken library "Kraken2_Standard_Fev2019").

But there isn't any trace of "Culicoides_sonorensis" in the report after analysis of a fastq file of Culicoides RNA-Seq sequences....

It is not exactly clear if we must add a description after the "sequence16|kraken:taxid|32630" from the manual

And it is also not clear if all sequences must be added one by one like in the manual (chr1.fa, chr2.fa)

ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6