Question

System Configuration For Plant Genome Data Analysis Using Illumina Reads

2

Entering edit mode

11.3 years ago

vaibhavbarot ▴ 30

Hi,

I want to purchase workstation for analyse illumina reads of plant genome (Read mapping and SNP calling).What minimum hardware configuration required???

illumina hardware • 4.7k views

ADD COMMENT • link updated 8.4 years ago by Saeid Kadkhodaei ▴ 90 • written 11.3 years ago by vaibhavbarot ▴ 30

Ram · Answer 1 · 2015-11-15

4

Entering edit mode

8.4 years ago

Saeid Kadkhodaei ▴ 90

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Saeid Kadkhodaei ▴ 90

score 2 · Answer 2 · 2012-12-26

2

Entering edit mode

11.3 years ago

Michael 54k

Minimum hardware specs are hard to give without further information about what is acceptable running time. Most analyses would run on old or cheap harware, but would simply take terribly long. The most important spec is memory size. Your genome, as indexed by the read mapper should fit in memory, that's what can be said without further details of genome size and software. Hardware Suitable For Generic Nextgen Sequencing Processing? seems to still be valid, just double the RAM figures. Otherwise, get as many CPU cores and fastest IO as you can afford. You will need a lot of disk space as well.

Other things to consider:

Support, system admin
Backup
disk space
do you really need a single new server for this, e.g. Is cloud an option, can you use existing large servers for this?

ADD COMMENT • link 11.3 years ago by Michael 54k

0

Entering edit mode

Thanks for respond,

I want minimum hardware configuration for mapping and SNP calling. Can you suggest ideal Hardware configuration for plant genome data of illumina.I dont have any existing server but planning to buy.

ADD REPLY • link 11.3 years ago by vaibhavbarot ▴ 30

1

Entering edit mode

minimum or ideal? that makes a difference. Also, what software 'exactly' are you going to run? What is the size of the largest genome you are working with? How many users in parallel will use the machine? You should use a Linux based server, will you use it interactively, or via ssh/telnet?

To find the minimum/optimal RAM size: get hold of/borrow a high-memory server, e.g. on amazon cloud, run a typical large job of the alignment step and monitor the process and its memory usage (e.g. 8GB), double the maximum memory required by that process and buy a decent computer that has this amount of RAM installed (e.g. 16 GB)and can host at least double this amount (e.g. 32GB, better up to 128GB) for later upgrades.

ADD REPLY • link 11.3 years ago by Michael 54k

0

Entering edit mode

Thanks Michael

I want generalize configuration for Plant genome illumina reads mapping and SNP calling.Can i do this using 4Quadcore, 16 GB RAM and 1TB storage computer.

ADD REPLY • link 11.3 years ago by vaibhavbarot ▴ 30

0

Entering edit mode

Depends exactly on what you want to do (read Michael's post above on finding minimal RAM size). The specs you have listed above are not nearly enough for what you want to do.

ADD REPLY • link 11.3 years ago by Josh Herr 5.8k

0

Entering edit mode

Well, RAM and CPU might be sufficient, but the system would very soon run out of storage for the read data. The memory might be just sufficient. From the BWA manual: "With bwtsw algorithm, 2.5GB memory is required for indexing the complete human genome sequences. For short reads, the ‘aln’ command uses ~2.3GB memory and the ‘sampe’ command uses ~3.5GB." The largest sequenced plant genome is barley (5.1Gb, if that scales linearly, 6GB should be enough), if it is one of the smaller plant genomes, then it might even work with less. All assuming the intended pipeline used BWA for read mapping.

ADD REPLY • link 11.3 years ago by Michael 54k

0

Entering edit mode

Thanks Michael, Can i call SNPs using configuration stated in previous post.

ADD REPLY • link 11.3 years ago by vaibhavbarot ▴ 30

0

Entering edit mode

No warranty, but most likely the analysis would work, but you will have no space to store it. You need to buy additional storage very soon, because your 1TB disk can be full after a few runs, assuming 100GB per run. Even the smallest compute solution offered by illumina has 20TB of disk space. http://www.illumina.com/documents/products/datasheets/datasheet_illuminacompute.pdf

ADD REPLY • link 11.3 years ago by Michael 54k

0

Entering edit mode

Thanks Among from above configuration I've mentioned, RAM & Processors (16GB & 4 Quadcore) are enough for SNP calling. Please suggest. I have to increase storage capacity.

ADD REPLY • link 11.3 years ago by vaibhavbarot ▴ 30

0

Entering edit mode

Average plant genome size right now is in the range of 6Gb to 8Gb. I might have a skewed view of RAM & CPU since the plant genomes I work with are in the range from 4Gb to 20Gb. If we knew what plant we were talking about and had an estimate of the genome size, we could give a little more information to you, vaibhavbarot.

ADD REPLY • link 11.3 years ago by Josh Herr 5.8k

score 0 · Answer 3 · 2012-12-26

0

Entering edit mode

11.3 years ago

Josh Herr 5.8k

To echo what Michael posted: You'll need to be quite clear on the size of your genomes (you'll need something with a lot of RAM and storage memory for plant genomes) and what you want to do at your workstation (will you be doing transcriptome assembly or just SNP calling?). Once you have an exact idea of what you'll be doing and the time frame you need for your analysis, then you can plan for the specs of your workstation. Storage For Miseq In-House may be some help in addition to the one that Michael posted.

ADD COMMENT • link 11.3 years ago by Josh Herr 5.8k

0

Entering edit mode

I want minimum hardware configuration for mapping and SNP calling. Can you suggest ideal Hardware configuration for plant genome data of illumina.I dont have any existing server but planning to buy.

ADD REPLY • link 11.3 years ago by vaibhavbarot ▴ 30