The recurrent question in data-intensive bioinformatics labs often revolves around which computing infrastructure to use. In the past four years as a bioinformatics Ph.D. student, I have both received and offered solicited and unsolicited advice regarding computing infrastructures using my prior experience in high-performance computing lab and current experience in data analytics. This blog post covers my experience in using private computing infrastructure as compared to adopting the cloud in bioinformatics/ data analytics, and the thoughts on the advantages and disadvantages.
The major challenge I found was that: The decision-making process of deciding between buying computing infrastructure versus adopting the cloud is very much like buying a car versus renting a car. The main challenge is the unpredictability of user usage pattern. The moment I walked into the car dealer, all I saw were the shiny cars with a seemingly affordable price tag, and all I had in my mind was to get the car now so that I don’t have to carry groceries back to grad housing for two miles whenever I go to Ralphs. Only till all the bills come then I know I am eating ramen for the next couple weeks and regret why the heck I bought the car.
It is the same game for computing infrastructure. The main problem in the story is that I am a poor graduate student having no money and I need to better utilize the limited financial resources to accomplish this goal. Fast completion time for the whatever demanding task and on the hardware, and like Ferrari, fast and powerful computing infrastructure doesn’t come cheap.