Question

Bioinformatics Cores In A World Of Cloud Computing And Genome Factories

26

Entering edit mode

12.3 years ago

Stephen 2.8k

I'm a director of a new bioinformatics core and I just wanted to start a general discussion here to hear others' thoughts.

What's the role of a bioinformatics core in a world of cloud computing and ultra-economy-of-scale genome factories (e.g. BGI, Complete Genomics)?

Cloud Computing

A recent article in GenomeWeb discusses Ion Torrent's (Life Technology) plans to enter the commercial bioinformatics arena, and their plans on launching a cloud-based variant analysis software package. (Article is paywalled, but free if you register with a .edu address).

Montana-based Golden Helix and Durham-based Expression analysis have teamed up to offer a service-based cloud computing solution for next-generation sequencing bioinformatics analysis. BusinessWire, GenomeWeb, GoldenHelix Webcast).

There are others competing in the commercial bioinformatics arena that I'm surely leaving off.

Genome factories bundling bioinformatics

I had meetings with reps from both BGI and Complete Genomics last week.

CG offers human whole-genome sequencing bundled with a very sophisticated bioinformatics pipeline that returns annotated variants, mapped reads with a local de novo assembly around variants, summary statistics, and summaries of evidence around called variants, indels, etc. Their price and value added by the bundled bioinformatics will be difficult to beat.

The BGI rep spoke of the cost reduction and throughput that they have with their massive economy of scale operation. Their sequencing services (DNA, RNA, epigenetics) all include bundled bioinformatics services.

What does a bioinformatics core need to do to thrive?

What does a core need to do to thrive when competing with other vendors that bundle bioinformatics costs with cheap sequencing? It may be too early to tell whether cloud computing services like Ion Reporter or GoldenHelix/ExpressionAnalysis will add any significant value to a sequencing service provider's offering, but these may also become a challenge in the near future for service-based cores.

next-gen sequencing cloud core • 5.2k views

ADD COMMENT • link updated 5.1 years ago by Jeremy Leipzig 22k • written 12.3 years ago by Stephen 2.8k

1

Entering edit mode

Keep in contact with others in bioinformatics cores to discuss these issues and others like them... http://bioinfo-core.org/

ADD REPLY • link 8.0 years ago by Madelaine Gogol 5.3k

score 14 · Answer 1 · 2012-01-16

Service and connection to the projects. In my experience the large-scale bioinformatics out of the BGI are often sub-par due in part to difference in training and background knowledge. There are a lot of "factory produced" bioinformaticians that have the level of deep biological background to provide truly meaningful analyses, at least in my experience.

I think these quick and dirty bundled analyses tend to return the easy and obvious answers, which suffices for many projects, but they often overlook or fail to find anything meaningful when the questions and projects become more difficult. There is no real incentive to do a thorough, in-depth analysis because it is time consuming.

Core facilities often work in collaboration with their clients, giving more than just a financial incentive. They also tend to have a much closer connection to the projects and employ people with higher levels of training and more complete backgrounds. And that is where core faciltiies, and bioinformaticians in general, need to differentiate themselves.

And I seriously doubt a lot of these will compete on price. Golden Helix and other off-the-shelf analysis packages can be prohibitively expensive for the vast majority of academic research labs.

Sean Davis · Answer 2 · 2012-01-16

I've noticed two diametrically opposed approaches in the applications of high throughput sequencing.

One is the "automated assembly line" type of approach where a somewhat simplistic but reasonably well documented process needs to be repeated for every identically/similarly produced dataset. The tasks lend themselves to automation via various tools but the automation and having to stick to a given process makes other exploratory type analyses more difficult to pursue.
The other option are data from tinkerers/innovators. The datasets produced by these scientists are always different from anything that we did before - to an extent that we need to go back to basics and custom develop a unique data processing pipeline. These projects are more fun from a professional point of view but the effort that we have to put in is substantially higher and not always properly recognized/appreciated.

So in the end it all depends on the type of problems that you will need to solve. Especially in this latter case there is no one to compete with (other than a difficulty of actually solving the problem itself)

In both cases I think most analyses that are provided by large scale consortia are somewhat simplistic and typically come up short in unexpected ways. The goal is not to compete with these services by redoing them but to make them actually useful to scientists.

score 4 · Answer 3 · 2012-01-17

I cannot emphasize enough the points made above. Purchasing whole genome sequence equates to having access to a bioinformatics professional to QC the data, provide further project-specific annotation, provide higher-level analysis, integrate with public databases and datasets as well as orthogonal datasets and clinical information, promote data integrity (backups, archival storage), and build value-added query and visualization strategies. These companies will promise much and will deliver it, but there is usually more work involved when the data arrive than anyone predicted. This has been the case for many commercial software tools, as well; they get the end-user 80% of the way there, but there is still more to do.

One cannot argue that there is not value in what sequencing services provide in terms of analysis. However, I have seen many an investigator get their 2TB disk from a provider only to realize how daunting the task of digging through the data actually is.

score 2 · Answer 4 · 2012-01-17

I fully support Dan and Istvan's answers. I get a lot of data from BGI and the like and there is always an element of re-analysis or further bespoke analysis. The problem with a pay for service is that goals and milestones are negotiated before the analysis and turn-the-crank workflows will only get you so far. Depending of the funding stream and model of your own core you should have much better relationship with your customers to deliver on their needs rather than just what they initially asked for.

You may also be interested in the international group bioinformatics core. Historically we have meet yearly at ISMB and have regular teleconferences. This topic could also be raised at one of these forums.