Biostar Beta. Not for public use.
Blasting against all assemblies on NCBI without downloading the genomes
0
Entering edit mode
12 months ago

Hey guys,

I would like to blastx all bacterial assemblies available on NCBI against a database of several proteins, without downloading the genomes (to save disk space - I'm working on a supercomputer, no cloud available so I'm totally relying on PC's physical space). The only output I would like to have on my PC is the hit genomes containing those proteins from my physical database. Do you have a solution? Thanks!

Assembly genome • 177 views
ADD COMMENTlink
1
Entering edit mode

All bacterial assemblies takes less than 600GB space. Surely a supercomputer has that much space to spare. All bacterial proteomes would be like 100GB, if even that..

ADD REPLYlink
0
Entering edit mode

Use the -remote option of commandline blast?

ADD REPLYlink
0
Entering edit mode

Will try to have a look, thanks!

ADD REPLYlink
0
Entering edit mode

You may want to do tblastn with your proteins (rather than the other way around) if you use the -remote option (since you don't want to download the genomes). Not sure how much time NCBI allows per query but choosing all bacterial assemblies/genomes may run up against the limit. Start with a single protein and an "Entrez query" restricting blast to a genus before expanding the search.

As @5heikki says below, as long as you have space available on the supercomputer this search would be best done locally by downloading the genomes/proteomes.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1