Kraken2 database curation
1
2
Entering edit mode
5.5 years ago
Asaf 10k

Hi all, I'm working on mouse gut microbiome samples and want to use kraken to get their taxonomic profile. I'm using kraken2 with the databases nt and bacteria (plus some others). The problem is that there are bacterial sequences integrated in some genomes in nt, they can be easy to track like weird mammals (I assume no bat entered the lab) but perhaps some parasites have bacterial DNA or fungi and these might be relevant. My question is, is there a neat way to remove those pseudo-bacterial sequences from the database or do some post analysis to remove these unspecific mappings?

Thanks

metagenomics kraken • 5.3k views
ADD COMMENT
2
Entering edit mode
5.0 years ago
Asaf 10k

So, five months later I'm happy to introduce domain_classifier which is a pretty simple naive-Bayes classifier to tell if a sequence is prokaryote or eukaryote. I wrote a civet pipeline, which is a pipeline management system internal to the Jackson Laboratory but also available on github to build the kraken2 database

What this package does is first predict PFAM domains on predicted ORFs and then use these domains to classify into a taxonomic domain. To filter the kraken DB I simply remove DNA sequences that strongly disagree with the reported taxonomy. This also removes mitochondrial and chloroplast genomes.

ADD COMMENT

Login before adding your answer.

Traffic: 2745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6