Blasting against all assemblies on NCBI without downloading the genomes
0
0
Entering edit mode
5.1 years ago

Hey guys,

I would like to blastx all bacterial assemblies available on NCBI against a database of several proteins, without downloading the genomes (to save disk space - I'm working on a supercomputer, no cloud available so I'm totally relying on PC's physical space). The only output I would like to have on my PC is the hit genomes containing those proteins from my physical database. Do you have a solution? Thanks!

Assembly genome • 1.1k views
ADD COMMENT
1
Entering edit mode

All bacterial assemblies takes less than 600GB space. Surely a supercomputer has that much space to spare. All bacterial proteomes would be like 100GB, if even that..

ADD REPLY
0
Entering edit mode

Use the -remote option of commandline blast?

ADD REPLY
0
Entering edit mode

Will try to have a look, thanks!

ADD REPLY
0
Entering edit mode

You may want to do tblastn with your proteins (rather than the other way around) if you use the -remote option (since you don't want to download the genomes). Not sure how much time NCBI allows per query but choosing all bacterial assemblies/genomes may run up against the limit. Start with a single protein and an "Entrez query" restricting blast to a genus before expanding the search.

As @5heikki says below, as long as you have space available on the supercomputer this search would be best done locally by downloading the genomes/proteomes.

ADD REPLY

Login before adding your answer.

Traffic: 2360 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6