Hardware requirements for bowtie2/STAR RNA-seq alignment
1
0
Entering edit mode
3.3 years ago
MatStat ▴ 160

Hi all,

I'm trying to understand what are the hardware requirements for alignment using bowtie2/STAR of bulk RNA-seq data in terms of:

  1. Processor and cores
  2. RAM
  3. SSD hardrive space
  4. Computing clusters
  5. Server

The data:

  1. Seq method: Illumina HiSeq High Output V4
  2. Single-end (ie single-read)
  3. 100 human tissue samples
  4. Each sample yielded 21 million reads.

All the best.

RNA-Seq Illumina bowtie bowtie2 STAR • 6.1k views
ADD COMMENT
0
Entering edit mode
3.3 years ago
GenoMax 141k

This question has been asked many times before: Hardware requirement for RNA-seq has several links to older threads. Requirements have not changed much over the years. Human and mouse genomes are similar in size and thus have similar hardware requirements.

Pay attention to (i.e. get the most you can) Memory --> CPU --> Storage in that order.

ADD COMMENT
0
Entering edit mode

Hi GenoMax,

Thank you for the prompt reply. I've read the answers in the link (and sub-links) you've added. But still didn't get an idea of a minimal to optimal settings using cluster computing servers for example.

Just as an example, I've tried to run 1 fastq sample on my mac (i5, 16 RAM, 500 SSD) and it was extremely strenuous and took more than 20 hrs.

Thanks.

ADD REPLY
1
Entering edit mode

There is no way around lack of memory/compute power. With most aligners you are going to need 30+ GB of free RAM with human/mouse genomes. If you start using more than a few threads (say 6-8) that requirement is going to start going up. Just throwing tons of cores does not solve the problem either since efficiency of software becomes important at that stage. Unless you are working with server hardware the I/O on a local machine (even with SSD's) is going to be limiting for the speed at which data can be aligned. It is not uncommon for it to take few hours to align 20-50M reads.

But still didn't get an idea of a minimal to optimal settings using cluster computing servers for example.

Any good 2 socket server (not a desktop) is going to provide anywhere between 8-64+ cores (depending on CPU's chosen). You would want at least 128G of RAM to have comfortable headroom for other tasks. Storage is really up to you. Ideally you will need performant network block storage that is mounted on this server via 10G ethernet or infiniband etc to provide the fastest possible read/write speeds. If that is not available then you will need to resort to local SSD's. Keep in mind that SSD's wear out and have a finite life if continuously written to.

ADD REPLY
0
Entering edit mode

Ok great thanks a lot for the answer.

ADD REPLY
0
Entering edit mode

Do you really need to use alignment for bulk RNA-seq? Why not use pseudoalignment? Less memory and computing requirements.

ADD REPLY
1
Entering edit mode

Hi dsull, So I am reproducing results according to a workflow protocol from GitHub. That means I need to do what they did. In addition, I assume they don't use pseudoalignment since it needs to be sensitive enough to get unmapped reads which can be further used.

Best

ADD REPLY

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6