Entering edit mode
6.0 years ago
lp.vandergouw
•
0
Hello all,
Using BLAST, I am trying to identify thousands of sequences. To make my job a bit easier, I would like to classify these sequences first by sorting them on phylum e.g. only keep BLAST hits from a certain pylum. To do this, I need to generate het complete linage of the hit using the NCBI taxonomic database. I know you can query the database with biopython. over the internet, but the machine I am working on has no internet access.
Could anyone give me some insight into this?
How big/long are the sequences?
One very hacky solution might be to run your sequences through Kraken which identifies species, typically in short read metagenomic data, but you could massage your input data potentially. It may work on pre-assembled contigs, in which case you should be fine.
You'll still need to be able to download the software and download/make a Kraken database though, so it's going to be tricky if you have no internet access at all.
NB, this may also only work for microbial data, I'm not 100% sure what Kraken can handle.
I did check our kraken, but it doesn't really do everything I need.