Question

Machine Spec For Running A Blast Service For ~50 Users

2

Entering edit mode

12.8 years ago

Lyco ★ 2.3k

I have been asked this question by a friend, but I didn't feel I could give a satisfactory answer. Maybe somene here at BioStar can share some experience.

The question is: what kind of hardware would you need to offer a BLAST service for about 50 different users, with an expected average of 5-10 concurrent requests, which might be higher at peak times. A wide range of different databases should be offered, ranging from single bacterial genomes to NCBI-style 'nr' databases.

My initial answer was that the number of concurrent BLAST runs should be kept to a minimum by using a queuing system (similar to what is done at NCBI, EBI and the other big centers). The main problem in this particular setting is that the majority of BLAST searches are not via a fancy Web interface but rather command-line requests, generating tabular or XML output.

Maybe I should ask two questions: 1) what kind of hardware would you need (# of CPUs, how much RAM, RAID?), and 2) has someone experience with a queuing system for command-line blast?

blast hardware server webservice • 5.7k views

ADD COMMENT • link updated 12.8 years ago by Mza ▴ 30 • written 12.8 years ago by Lyco ★ 2.3k

score 3 · Answer 1 · 2011-07-01

3

Entering edit mode

12.8 years ago

Alastair Kerr 5.3k

I do not think that 50 users is that many, and a cluster would be over-kill: especially for regular BLAST usage. i.e. sporadic usage per person with nobody constantly using BLAST as a short read mapper or constantly running multiple large genomes against nr.

A single dual-processor server with about 24 threads and about 64GB RAM would cope easily in the above scenario (these are the specs of one of my servers with a similar number of users and used for a lot more than just BLAST)

That would handle 10 concurrent BLAST jobs without the need for a queuing system. If it became a problem then you could always wrap the blast process and submit to a queuing system such as torque (which uses the qsub command).

ADD COMMENT • link 12.8 years ago by Alastair Kerr 5.3k

0

Entering edit mode

@Alastair, thanks for this advice. Being no hardware person, I am a bit puzzled by your '24 threads on a dual-processor machine'. Does it make sense to run BLAST with more threads than there are processor cores ? Or do you have 12-core processors ?

ADD REPLY • link 12.8 years ago by Lyco ★ 2.3k

0

Entering edit mode

Sounds like a Westmere system with 24 execution threads. I agree that 50 is not that many users. How many concurrent processes do you tend to have?

ADD REPLY • link 12.8 years ago by Mndoci ★ 1.2k

0

Entering edit mode

Deepak, could you please offer some esplanation for the biologist-bioinformatician with limited hardware expertise? I looked up Westmere in Wikipedia but only found references to 2-8 core processors. Can you run more than one thread per core, and does it make sense?

ADD REPLY • link 12.8 years ago by Lyco ★ 2.3k

0

Entering edit mode

each cpu has 6 cores = 12 logical cores per CPU. The technology to do this is Intels "hyper-threading" and AMD has something similar (found on the latest chips). The operating system sees 24 CPUs and runs jobs on each. It is much faster than an equivalent 12 node cluster from 6 years ago.

ADD REPLY • link 12.8 years ago by Alastair Kerr 5.3k

0

Entering edit mode

each processor has 6 cores = 12 logical cores per processor. The technology to do this is Intels "hyper-threading" and AMD has something similar (found on the latest chips). The operating system sees 24 CPUs and runs jobs on each. It is much faster than an equivalent 12 node cluster from 6 years ago.

ADD REPLY • link 12.8 years ago by Alastair Kerr 5.3k

0

Entering edit mode

The most imporant question really is the type of data that is going to be run through BLAST. People running a handful of queries sporadically will be very different from people running large datasets.

ADD REPLY • link 12.8 years ago by DG 7.3k

score 3 · Answer 2 · 2011-07-03

As the other contributors mentioned, you can get quite a long way with high memory, multi-core hardware, ~5 to 10 concurrents depending on the range and size of the sequence databases. Adding more memory or cores will help (vertical scaling), but you'll see diminishing returns. For spiky or higher level usage you're going to need to start distributing the load across multiple boxes (horizontal scaling).

For a more scalable system, consider provisioning a collection of servers as a processing cluster. Best practices for batch processing apply here: the nodes should be 'share nothing' with tasks distributed via message queues.

For BLAST, it can be more cost effective to run a larger number of less powerful servers.

A few other topics to consider to optimise such a system for throughput and cost:

Shard by database and usage

You can provide different queues to route searches to specific groups of servers. High use or large datasets can occupy their own dedicated, heavyweight infrastructure, whilst lower usage or smaller datasets can happily coexist on smaller, cheaper hardware. Monitor usage, response times and latency to gauge the best bang for buck.

Queue-aware compute

You could investigate the possibility of running the searches against elastic compute with services such as EC2 (*). With message queues and horizontal scaling, running on utility computing can allow you to increase your capacity under increased demand, and reduce it as demand subsides (evenings, weekends etc).

Caching

Reduce the overhead of repeat submission (very common with BLAST!), by caching the input parameters and search results in a database. If a user repeats a search, just return the result immediately.

Friendly wrappers

A bit OT, but important for uptake of a distributed system: make it easy for your users to submit their searches. Depending on the technical knowledge of your users, grid-engine-style tools can help. However for short tasks which are submitted often (such as BLAST, format exchange, Radar, Needle, etc), some users may find them heavyweight. Instead, you can hide a lot of this complexity by providing a thin wrapper to your users that submits their task to a queue, and polls or awaits notification that the task has completed before returning results locally.

So - in answer to your question, the horsepower of the physical hardware is only one factor in determining throughput and concurrency. There are a number of architectural factors that can help you scale up too.

= heads up, I work at Amazon.

score 2 · Answer 3 · 2011-07-01

2

Entering edit mode

12.8 years ago

Chris ★ 1.6k

Regarding 2): We use SGE [1] here on our group internal compute cluster consisting of currently 600 CPUs. Of course it's a generic cluster queueing software and its usage isn't blast specific, but it is easy to conduct parallel blast runs on that platform. In fact, this is one of the major use cases in our group.

Chris

[1] http://en.wikipedia.org/wiki/Oracle_Grid_Engine

ADD COMMENT • link 12.8 years ago by Chris ★ 1.6k

1

Entering edit mode

I should add that a scenario like this comes with some issues. One major problem we encountered was that a large amount of concurrent blast accesses to the underlying sequence database produces network and I/O load in a dimension that couldn't be handled any more by one single NFS server. The solution was to maintain node-local copies of the database and restrict access to those.

ADD REPLY • link 12.8 years ago by Chris ★ 1.6k

0

Entering edit mode

thanks for your answer. I will forward the SGE idea to my friend. With regard to your comment: this is something that even we see here in my small group. Running BLAST via NFS is a bad idea, so we keep node-local copies of all databases.

ADD REPLY • link 12.8 years ago by Lyco ★ 2.3k

0

Entering edit mode

Do you have experience, how well BLAST scales beyond 4 threads? I am currently having issues with running BLAST+ in multi-thread mode (has been reported to be fixed, but not so), so I keep using blastall

ADD REPLY • link 12.8 years ago by Lyco ★ 2.3k

0

Entering edit mode

Indeed, we once made an investigation to answer that question. Result was that the blast runtime does not decrease linearly with amount of threads. If I remember correctly, beyond 3-4 threads there was no significant gain in speed visible any more.

ADD REPLY • link 12.8 years ago by Chris ★ 1.6k

0

Entering edit mode

Its my understanding that the scaling with numbers of threads aspect is highly dependent on both the database being searched and the input queries. If you have a large number of queries (whole genome data) scaling to multiple threads makes a big difference. If you only have one or a handful of queries it won't as they parallelize in different ways. I usually run hundreds to thousands of queries at a time and blast+ seems to scale above 4 threads quite well but I haven't bothered to actually benchmark it.

ADD REPLY • link 12.8 years ago by DG 7.3k