Biostar Beta. Not for public use.
Question: Need suggestions on increasing the computational power for the bioinformatics analyses involving DNAseq, RNAseq, Chipseq pipeline
0
Entering edit mode

Hello friends,

I am looking for some help. We are planning to increase our IT infrastructure for bioinformatics analyses.

Currently, we have 3 (head node, compute node1, compute node2) nodes (3 PowerEdge R610 servers).

Head node - Processor: Intel Xeon X5650 2.67 GHz, Total number of cores : 24, Total RAM : 48GB

Compute node1 - Processor: Intel Xeon X5650 2.67 GHz, Total number of cores : 24, Total RAM : 48GB

Compute node2 - Processor: Intel Xeon X5650 2.67 GHz, Total number of cores : 24, Total RAM : 48GB


Case 1 - Increasing the RAM in the existing PowerEdge R610 servers. Maximum I can go up to 192GB in each node.

Case 2 - Purchasing a new R610 server (make it as a third compute node (c3 node), currently we have a head node, c1 node, and c2 node).


I know that the first case will be cheaper compared to the second case, but on a long term which one will be a good investment.

ADD COMMENTlink 3.3 years ago bioinforesearchquestions • 230 • updated 3.3 years ago dvanic • 240
Entering edit mode
1

There is no substitute for RAM, when you need more of it. You could at least add some memory to worker nodes (you could go with less than 48G on the head node if that is not participating in actual jobs) as a part of this upgrade.

That said, purchasing memory after the fact is always an expensive proposition (you need to get memory that is right kind ECC, matching speed/type etc) so weigh your upgrade options for case 1 over other options.

ADD REPLYlink 3.3 years ago
genomax
68k
Entering edit mode
0

Thanks, @genomax2. Actually we have only a mini cluster computing infrastructure. As you mentioned,my head node is not participating in actual jobs. With current my resources, I could parallelize only 12 jobs in both c1 and c2. If I provide the following hardware resources (slots_limit=5,mem_limit=8G). Now a days most of the bioinformatics tools require minimum 8GB of RAM.

I have encountered a tool called "Kraken taxonomic sequence classification system". Took this line from the kraken website "The default database size is 75 GB (as of Feb. 2015), and so you will need at least that much RAM if you want to build or run with the default database". With my current resource, I couldn't use kraken.

I will try to convince my supervisor on investing in the infrastructure.

1) How much RAM do you recommend me for each working nodes (c1 and c2)?

ADD REPLYlink 3.3 years ago
bioinforesearchquestions
• 230
Entering edit mode
1

Some of the following suggestions would depend on what your budget is for all this.

You could get a new simple single CPU server as a new head node and make the current head node another worker so you have 3 good workers.

I don't recall if you need to worry about UDIMM/RDIMM's with R610 (current R630 only seems to use RDIMMS). You can't mix and match memory so you may not be able to go to full 192G if you don't have the right kind of memory in the current servers (unless you dump all existing memory which you probably do not want to do). So get the most you can to max out the 3 workers (it may be 64 or 128G depending on how many slots you still have open). You would want to carefully check current hardware out/consult a local expert (if you are not a sys admin) before you order new memory.

ADD REPLYlink 3.3 years ago
genomax
68k
Entering edit mode
0

Thanks, for your valuable information. I received following things from a dell expert,

  • highest memory DIMM is the 16GB module

  • As there are 12 DIMM slots on the server, the highest amount of RAM each of these servers could achieve would be 192GB.

  • 1 Dell 16GB (2Rx4 DDR3 RDIMM 1600MHz LV) costs around $150

Do I need to increase the number of cores or 24 cores are good enough?

ADD REPLYlink 3.3 years ago
bioinforesearchquestions
• 230
Entering edit mode
0

So you are planning to replace all the memory that is currently there?

I would consolidate memory (as much as open slots allow into one of the nodes) and save some money (use that to get a cheap one socket head node). One of the nodes would have less RAM but you could plan your job slots accordingly. There is no point in trying to upgrade existing CPU's (if you already have both sockets populated).

ADD REPLYlink 3.3 years ago
genomax
68k
1
Entering edit mode

If you tend to run many small jobs, get another node. If you have a need for single colossal jobs (e.g. Trinity) you need more ram.

From experience, 48gb isn't much ram for that many cores (only 2gb/core). It stinks to have idle cores because you run out of RAM first.

ADD COMMENTlink 3.3 years ago pld 4.8k
Entering edit mode
0

Thanks @Joe and @ genomax2 for your suggestions. To be frank, I don't have much background on server management.

Generally, I used to work on variant calling pipeline, differential gene expression pipeline, differential splicing pipeline and RNASeq variant calling pipeline.

For your reference, I provided some of the outputs of "lscpu command". It shows CPUs as "Threads/core x Cores/socket x Socket = 2x6x2=24)

CPU(s) : 24 On-line CPUs list : 0-23 Thread(s) per core : 2 Core(s) per socket : 6 Socket(s) : 2

For instance, while submitting jobs using sun grid engine

1) If I provide the following hardware resources (slots_limit=5,mem_limit=16G). Mostly, I could parallelize only 2 jobs in compute node 1 and 2 jobs in compute node 2.

2) If I provide the following hardware resources (slots_limit=5,mem_limit=12G). Mostly, I could parallelize only 4 jobs in compute node 1 and 4 jobs in compute node 2.

Note: head node will not be used for the analyses.

As you suggested, getting new node will be good. How much ram would you recommend in each of the 3 nodes. I checked with the service provider, max I could increase upto 192GB in each node.

Do I need to increase the number of cores or 24 cores are good enough?

ADD REPLYlink 3.3 years ago
bioinforesearchquestions
• 230
1
Entering edit mode

Alternative suggestion - would using Amazon AWS be more cost-effective for your research needs? You can fire up nodes of whatever size/configuration your project requires, and shut them off when not using...

I'm not in any way affiliated with Amazon, but I've had very good success with using their resources, and if _I_ were making any kind of purchasing decision, I'd be much keener to spend the money on AWS than to maintain a tiny in-house server.

ADD COMMENTlink 3.3 years ago dvanic • 240

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0