Question

Forum:Snakemake vs. Nextflow: strengths and weaknesses

41

Entering edit mode

6.8 years ago

ropolocan ▴ 810

I have seen increasing interest in workflow/pipeline management systems such as snakemake and nextflow. In my opinion, both seem very interesting and very promising. There is a very interesting review from 2016 in which bash, make, snakemake and nextflow were compared: https://www.jmazz.me/blog/NGS-Workflows

The author of that review did a very good job of analyzing the strengths and weaknesses of snakemake and nextflow. I am not sure how much has changed since then, but in your experience, what would be some criteria that bioinformaticians could consider to choose one over the other? Have some of the identified weaknesses of both snakemake and nextflow have been addressed since then?

snakemake nextflow • 43k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.8 years ago by ropolocan ▴ 810

5

Entering edit mode

I started using snakemake 6 months ago, and now I have shifted all my pipelines to snakemake (ChIP-seq, RNA-seq, ATAC-seq and DNA-seq). I am pretty happy with it. once you get the idea of how snakemake works (think in a bottom-up fashion), it is easy to build up your own pipelines. BTW, the documentation is awesome.

you can write a customer script for submitting jobs to the cluster for each platform (LSF, moab...) if you want more control of your jobs. e.g. https://bitbucket.org/snakemake/snakemake/issues/28/clustering-jobs-with-snakemake

only downside for me is that when I have more than 1000 jobs to submit, it takes time for snakemake to process the metadata associated with each job. For a dry-run, it takes minutes. I do not know how fast nextflow is.

ADD REPLY • link 6.8 years ago by Ming Tommy Tang ★ 3.9k

1

Entering edit mode

And BioMake is off the game? It uses prolog (which is both the weakness and the strength...).

ADD REPLY • link 6.8 years ago by kamiljaron ▴ 220

0

Entering edit mode

Hello, @kamiljaron. I was not aware of BioMake; I would have to read up on it. I do not know prolog, nor have I ever used a logical programming language, but I will read more about what BioMake has to offer.

ADD REPLY • link 6.8 years ago by ropolocan ▴ 810

2

Entering edit mode

No knowledge of prolog required! You can use gnu make syntax to specify your workflow

ADD REPLY • link 6.1 years ago by cmungall ▴ 30

0

Entering edit mode

How to compare WDL/CWL and snakemake?

ADD REPLY • link 4.7 years ago by Shicheng Guo ★ 9.4k

2

Entering edit mode

never used either of them but CWL is just a specification for how to describe a pipeline, it does not actually execute a pipeline itself. Snakemake executes the pipeline in addition to describing it with its own syntax

ADD REPLY • link 4.7 years ago by steve ★ 3.5k

score 14 · Answer 1 · 2017-06-19

14

Entering edit mode

6.8 years ago

dariober 14k

Besides what a tool can or cannot do I like to check the quality of the documentation, whether it is actively developed and maintained, how many developers contribute to it, and size of the user base.

It seems to me that snakemake and nextflow are pretty much on a draw for all these metrics and both are pretty good (although in terms of user base and developers they are far from tools like luigi). So I think it's a difficult choice between these two...

I haven't tried nextflow, but recently I started working with snakemake and I'm very happy with it. Actually I feel dumb that for years I've been hacking together bash scripts to run pipelines. For me one advantage of snakemake is that a snakemake script is effectively python with additional features on top. So if you know python, putting some complex logic and functions in a snakemake script is straightforward. I guess the same applies to nexflow but using groovy, which is not so popular though.

From the review you link it seems nextflow doesn't have a "dry run" option. I find dryrun to be super useful to see what would be executed and for developing and debugging is great.

Just my 2p...

ADD COMMENT • link 6.8 years ago by dariober 14k

1

Entering edit mode

Thank you very much for your answer, @dariober.

It seems to me that snakemake and nextflow are pretty much on a draw for all these metrics and both are pretty good (although in terms of user base and developers they are far from tools like luigi). So I think it's a difficult choice between these two...

I am curious about luigi. I have read many good comments about it, and I will be looking into testing it as well. I was testing Snakemake and I can see why it has garnered attention.

Actually I feel dumb that for years I've been hacking together bash scripts to run pipelines. For me one advantage of snakemake is that a snakemake script is effectively python with additional features on top. So if you know python, putting some complex logic and functions in a snakemake script is straightforward. I guess the same applies to nexflow but using groovy, which is not so popular though.

Using snakemake was kind of an "eureka" moment for me as well. It has so much potential, and I look forward to adapt other pipelines I had written on bash or python to snakemake.

ADD REPLY • link 6.8 years ago by ropolocan ▴ 810

1

Entering edit mode

About the dry run option, if I am not wrong, nextflow does not have it because it does not know a priori what will be the exact execution dag. Nextflow language is more expressive and the execution dag may depends on the input data if you have conditional executions in your workflow for example (which is not possible in Snakemake I think?)

ADD REPLY • link 6.8 years ago by Fred ▴ 780

0

Entering edit mode

Snakemake allows for conditional creation of the DAG and conditional execution of different code based on the input.

ADD REPLY • link 6.2 years ago by endrebak ▴ 960

1

Entering edit mode

For me one advantage of snakemake is that a snakemake script is effectively python with additional features on top.

I would call this a disadvantage. The Python ecosystem is a mess to work with when it comes to 3rd party libraries. I tried to install it for myself on our HPC and immediately hit a million issues with environment management, not all of which are solveable with virtualenv's or conda. On the other hand, Nextflow installs seamlessly on any system that has Java 8, including our HPC. Re-learning the few extra programming bits I needed in Groovy was a very small price to pay in order to have Nextflow's greater ease of portability & execution.

ADD REPLY • link 5.9 years ago by steve ★ 3.5k

2

Entering edit mode

Funny, I find java generally more annoying to deal with. To each their own I guess.

ADD REPLY • link 5.9 years ago by Devon Ryan 104k

1

Entering edit mode

I've never actually had to deal with Java to get Nextflow to work, beyond making sure it was installed and using Java 8. Installing Nextflow has been a one-liner on every system I've tried. On the other hand, every Python based workflow management system I have tried (along with most other Python packages) have required a lot of hands-on environment configuration and management, which is not only a pain in the butt but also greatly impairs the feasibility of popping up a pipeline instance on new systems on an ad-hoc basis.

ADD REPLY • link 5.9 years ago by steve ★ 3.5k

0

Entering edit mode

what is luigi and how come I've never heard of it? It's supposed to be doing the same as Snakemake/NextFlow, while being even more popular?!

ADD REPLY • link 13 months ago by e.r.zakiev ▴ 190

score 9 · Answer 2 · 2017-06-18

9

Entering edit mode

6.8 years ago

shenwei356 8.4k

Table 1: Comparison of Nextflow with other workflow management systems

Workflow	Nextflow	Galaxy	Toil	Snakemake	Bpipe
Platforma	Groovy/JVM	Python	Python	Python	Groovy/JVM
Native task supportb	Yes (any)	No	No	Yes (BASH only)	Yes (BASH only)
Common workflow languagec	No	Yes	Yes	No	No
Streaming processingd	Yes	No	No	No	No
Dynamic branch evaluation	Yes	?	Yes	Yes	Undocumented
Code sharing integratione	Yes	No	No	No	No
Workflow modulesf	No	Yes	Yes	Yes	Yes
Workflow versioningg	Yes	Yes	No	No	No
Automatic error failoverh	Yes	No	Yes	No	No
Graphical user interfacei	No	Yes	No	No	No
DAG renderingj	Yes	Yes	Yes	Yes	Yes
Container management
Docker supportk	Yes	Yes	Yes	No	No
Singularity supportl	Yes	No	No	No	No
Multi-scale containersm	Yes	Yes	Yes	No	No
Built-in batch schedulersn
Univa Grid Engine	Yes	Yes	Yes	Partial	Yes
PBS/Torque	Yes	Yes	No	Partial	Yes
LSF	Yes	Yes	No	Partial	Yes
SLURM	Yes	Yes	Yes	Partial	No
HTCondor	Yes	Yes	No	Partial	No
Built-in distributed clustero
Apache Ignite	Yes	No	No	No	No
Apache Spark	No	No	Yes	No	No
Kubernetes	Yes	No	No	No	No
Apache Mesos	No	No	Yes	No	No
Built-in cloudp
AWS (Amazon Web Services)	Yes	Yes	Yes	No	No

ADD COMMENT • link 6.8 years ago by shenwei356 8.4k

4

Entering edit mode

To be fair it would be nice to see the same table compiled or commented by the authors of snakemake... With respect to slurm, I don't know what is meant by "partial" support in snakemake. I started playing with snakemake and running jobs using slurm is incredibly simple.

ADD REPLY • link 6.8 years ago by dariober 14k

3

Entering edit mode

Yeah, snakemake has full support for anything that uses drmaa, which I expect is also what Galaxy uses and probably what nextflow uses. Further, the footnote in the table for that section basically amounts to, "Actually, it has full support for these and any future schedulers, you just have to tell it how to execute the commands." I prefer the snakemake way of doing this, since everyone submits jobs through a wrapper I wrote and that way lots of things (temp space, memory usage, queue, etc.) can be conveniently set without including them again and again in snakemake files.

ADD REPLY • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

Nextflow does not use DRMAA. It uses the scheduler's native directives. Here is an example from the source code. Also note that cluster options such as memory and CPUs can all be set for pipeline processes independently of the actual pipeline script, and you can use profiles to have multiple sets of configurations for different systems (e.g. one pipeline script, and different execution configs for HPC, local, AWS, etc.). Docs here

ADD REPLY • link 5.9 years ago by steve ★ 3.5k

2

Entering edit mode

All of that largely applies to snakemake too :)

ADD REPLY • link 5.9 years ago by Devon Ryan 104k

2

Entering edit mode

The table is outdated by now, Snakemake does support Kubernetes AFAICT: https://snakemake.readthedocs.io/en/stable/executable.html#executing-a-snakemake-workflow-via-kubernetes

ADD REPLY • link 5.9 years ago by Roman Valls Guimerà ▴ 620

5

Entering edit mode

The table was never particularly accurate to begin with.

ADD REPLY • link 5.9 years ago by Devon Ryan 104k

1

Entering edit mode

Thanks for sharing this table, @shenwei356! It is very interesting to see that nextflow has stream processing, workflow versioning, and full support for SLURM, in addition to having native task support for any language. I believe snakemake has native task support for R now as well. Thanks again for your answer.

ADD REPLY • link 6.8 years ago by ropolocan ▴ 810

1

Entering edit mode

I think this table is little outdated in relation to bpipe - which I use daily. SLURM support exists (at least we are using it in-house), and I am pretty sure that stages in R can be run natively without wrapping in an Rscript.

ADD REPLY • link 6.4 years ago by A. Domingues ★ 2.7k

score 7 · Answer 3 · 2017-06-19

7

Entering edit mode

6.8 years ago

Sinji ★ 3.2k

I'm a big fan of Nextflow. I've used Snakemake in the past, and it was originally my go-to workflow language, but the built in support for Docker, Singularity, and HPC environments that Nextflow provides just can't be beat.

The only downside is you have to use Groovy.

ADD COMMENT • link 6.8 years ago by Sinji ★ 3.2k

1

Entering edit mode

Snakemake has singularity support with the singularity directive. I haven't used nextflow, but I would be amazed if it is as flexible as Snakemake. (Note to past self: there is a flexibility vs rigor tradeoff.)

ADD REPLY • link 5.9 years ago by endrebak ▴ 960

3

Entering edit mode

Get prepared to be amazed.

ADD REPLY • link 6.2 years ago by pditommaso ▴ 230

0

Entering edit mode

Thank you very much for your answer, @Sinji. I also look forward to test Nextflow. Both workflow systems/languages have so much potential. I think they could make a very important impact on bionformatics.

ADD REPLY • link 6.8 years ago by ropolocan ▴ 810

0

Entering edit mode

nextflow is a good programming language and very promising that I agree. Unfortunately I have run into one of its limitations: the static nature of the channel object. I was composing more complicated pipelines. I believe this limitation will be eliminated in future efforts by the nextflow team. I have submitted feature requests. My problem is when my processA produces many files, particularly fastq paired reads, then my processB will use the output files from processA as input in such a way that we want to turn the files into parallel executions for each pair. Say I only have one (actually I can have many) processA running. While processA is generating pairs of fastq files (file1_R1.fastq.gz, file1_R2.fastq.gz), (file2_R1.fastq.gz, file2_R2.fastq.gz), ... I want processB to start processing the pair of files as processA is finished generating them. Right now I have to use two pipelines to do this job. the Channel.fromFilePairs("*_R{1,2}.fastq.gz) will be empty on start of the pipeline. So processB will never do anything since the Channel was constructed at script launch time and empty. I guess the implementation of such a dynamic feature needs very fundamental changes to the language. There might be other ways to implement my requirement that I have not explored yet. I am still learning nextflow. For example the watchPath, subscribe, etc. If any one has experienced such limitations, and had found a solution please let me know.

ADD REPLY • link 4.3 years ago by kmzhou4 • 0

1

Entering edit mode

This sounds like something that goes against the basic premise of Nextflow. Its possible to have multiple outputs, even a dynamic amount of outputs, but they are not released from a task until the task is complete. As far as I know, watchPath only works for the initiation of loading items into a channel, not for watching for outputs of a task. I cannot think of a good solution for this in Nextflow, since this would violate the integrity of the task (e.g. what do you do if you release an output then the task fails after some output has already proceeded through rest of the pipeline?). Personally, I would just consider this an edge-case where you have to accept a small throughput penalty waiting for processA to finish producing all files before any files can continue with the pipeline.

ADD REPLY • link 4.3 years ago by steve ★ 3.5k

0

Entering edit mode

Can you not just output from one process a tuple of the two paired files, and use this channel as input for the next process? See https://www.nextflow.io/docs/latest/process.html?highlight=tuple#output-tuple-of-values

ADD REPLY • link 4.1 years ago by alan.ocallaghan • 0

score 4 · Answer 4 · 2017-06-18

4

Entering edit mode

6.8 years ago

ropolocan ▴ 810

I just read this excellent review by @Jeremy Leipzig. This article can be helpful for deciding which workflow management system is more suitable to each one's needs: https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbw020

ADD COMMENT • link 6.8 years ago by ropolocan ▴ 810

score 4 · Answer 5 · 2017-11-29

4

Entering edit mode

6.4 years ago

ropolocan ▴ 810

I am revisiting this post to mention that snakemake supports automated deployment of software dependencies with conda as well as the specification of conda environments per rule. This is very exciting!

ADD COMMENT • link 6.4 years ago by ropolocan ▴ 810

1

Entering edit mode

This feature has also been added to Nextflow. Link

ADD REPLY • link 5.9 years ago by steve ★ 3.5k

0

Entering edit mode

Excellent! Thanks for bringing attention to this, @steve. I will definitely try it out.

ADD REPLY • link 5.9 years ago by ropolocan ▴ 810

score 2 · Answer 6 · 2018-01-18

2

Entering edit mode

6.3 years ago

ropolocan ▴ 810

I thought I would share this Reddit thread on workflow management systems. There are very interesting posts on snakemake vs. nextflow: https://www.reddit.com/r/bioinformatics/comments/73am0k/ncbi_hackathons_discussions_on_bioinformatics/

ADD COMMENT • link 6.3 years ago by ropolocan ▴ 810