Question

Integrating A Local Galaxy Instance With Cluster Or Cloud Compute Resources?

8

Entering edit mode

11.2 years ago

Adam Cornwell ▴ 510

I've been looking at putting a local Galaxy instance together so that we have a platform accessible to lab members for viewing data when our NGS samples start rolling in. I've got a single server for this right now, which obviously would be a major bottleneck for trying to do an end-to-end analysis of a larger dataset. We do have cluster computing facilities available as a core resource however- so I started thinking about the possibility of having the option of using a local Galaxy instance to initiate compute tasks on a remote server, either in our computing center or in the cloud. This isn't something available in the core Galaxy codebase at the moment as far as I know, but seems like something that could exist somewhere.

To clarify a bit- I can't actually host Galaxy on the cluster, and hosting it full-time in the cloud would probably be too much money. Our existing server system might be powerful enough for most tasks. When there's something that would take a month to run on that box, it would be great to be able to use the same front-end instance to kick off processing on a remote system- like to spin up EC2 nodes and handle the sending/receiving of data,

The main motivation would be the ability to have a local system maintaining our own sample database and workflow management, while being able to leverage larger computing systems for bigger jobs.

I don't really expect something like this to exist already for Galaxy and the like, but it seemed like something worth asking. You never know what resources are around that Google somehow missed.

galaxy cloud ngs • 6.0k views

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 11.2 years ago by Adam Cornwell ▴ 510

0

Entering edit mode

I know this topic is pretty old, but I am looking forward to set up the same kind of galaxy installation. Is there anything new about it?

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by loic.bourg • 0

0

Entering edit mode

I moved your "answer" to a comment on the above post. This is not an answer to the above question.

I don't quite understand your question -- Galaxy is a constantly evolving project with many people contributing to it, so of course there is plenty new in the latest version as opposed to two years ago. Here's information on how to install Galaxy -- let us know if you have any specific questions, but keep in mind that computer installation issues are often user specific and not within the bounds of this bioinformatics question and answer forum.

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 9.3 years ago by Josh Herr 5.8k

Ram · Answer 1 · 2013-02-12

While the default configuration runs jobs in different processes on the same computer the Galaxy software has builtin support to run jobs via a cluster. That is how the main Galaxy instance is actually configured:

As you can see below there is support for a number of job schedulers such as PBS or Condor:

https://bitbucket.org/galaxy/galaxy-dist/src/9fd7fe0c5712/lib/galaxy/jobs/runners?at=default

To set it up that way you probably need more detailed advice from their support channels.

Ram · Answer 2 · 2013-02-12

You want a local Galaxy instance which can spin up and utilize cloud compute nodes as needed? This is not available out of the box, and I think this would require significant effort to accomplish. You would want to either hack up CloudMan or hack up something around the LWR depending on whether you wanted to export your local file systems to the cloud or stage files their on a per job basis.

The CloudMan solution would involve running CloudMan in a master mode on your local system and then building an Amazon image tailored to your setup for running CloudMan worker nodes on Amazon. It would require significant tweaking just to run CloudMan on your local setup, and then you would need to hack it to modify your file exports (and maybe firewall) as new instances are created. I have not really ever researched exposing file mounts to ephemeral instances on EC2, this may not be very performant.

The LWR solution would involve setting up a cloud image with the LWR server installed (if you really are interested I can bake this into CloudBioLinux for you, it is on my long term TODO list anyway), and then creating some sort of management script or console that would spin up cloud instances with the LWR installed on them and store the address to them somewhere. You can then use Galaxy's dynamic job runners* to send jobs into the cloud when LWR instances are available. Dealing with things like genome indices in this case would require some work, but I hope to make this process easier this year.

Disclosure: I implemented the LWR and dynamic job runners and I am a regular contributor to CloudMan.

Ram · Answer 3 · 2013-02-12

1

Entering edit mode

11.2 years ago

Alex Paciorkowski 3.5k

Adam, this is definitely do-able -- in fact there's an upcoming workshop at Bio-IT world on just this topic here (and, ahem...yes...full disclosure I'm one of the presenters...) Our excellent collaborators at University of Chicago's CI are implementing just this kind of system. If you like we can talk in more detail about how to do this.

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 11.2 years ago by Alex Paciorkowski 3.5k

Ram · Answer 4 · 2013-02-12

0

Entering edit mode

11.2 years ago

Josh Herr 5.8k

I guess I'm not totally clear on what you are asking about; if you want to know how to set Galaxy up on your own server or the possibility of moving it from your own server to a cluster in the future. Does this link Get Galaxy: Galaxy Download and Installation help to answer your question?

ADD COMMENT • link updated 2.1 years ago by Ram 43k • written 11.2 years ago by Josh Herr 5.8k

1

Entering edit mode

I'm pretty sure what I'm asking about isn't supported in the main Galaxy package, but it seems like something that could feasibly have already been developed by a third party. Basically, does there exist anything to allow a local Galaxy instance to act as a front-end for kicking off analysis on cluster or cloud compute resources. Since the cluster is a shared resource, and no one wants to pay for hosting Galaxy from the cloud full-time, I'm looking for a compromise solution. (will edit for clarification)

ADD REPLY • link updated 2.1 years ago by Ram 43k • written 11.2 years ago by Adam Cornwell ▴ 510