Workflows! Where is the code?
2
0
Entering edit mode
5.9 years ago
Kermit ▴ 90

Hey there. Trying to wrap my head around workflows.

Taking a look at the examples below, I don't see any Python/ R/ Java/ C/ PERL code - which is confusing as a developer that has entered bioinformatics.

CWL -- https://github.com/Duke-GCB/GGR-cwl/blob/master/v1.0/ChIP-seq_pipeline/01-qc-pe.cwl

WDL -- https://software.broadinstitute.org/wdl/documentation/article?id=7615

Do workflows just string together existing tools that other people have created? Is the code at a lower level or do you provide them with your own scripts as input files or are the "commands" the real code?

cwl wdl workflows • 2.8k views
ADD COMMENT
6
Entering edit mode

Do workflows just string together existing tools that other people have created?

yes.

ADD REPLY
2
Entering edit mode

Is the code at a lower level or do you provide them with your own scripts as input files or are the "commands" the real code?

both. Most workflows I know will use bash as the intepreter. e.g: http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html

Note that shell commands in Snakemake use the bash shell in strict mode by default.

https://www.nextflow.io/docs/latest/process.html

d the BASH interpreter will replace it with the actual value.

BUT you can also use the language you want instead of bash.

https://www.nextflow.io/docs/latest/process.html

The process script is interpreted by Nextflow as a BASH script by default, but you are not limited to it. You can use your favourite scripting language (e.g. Perl, Python, Ruby, R, etc), or even mix them in the same pipeline.

or in Make: https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html " Choosing the Shell"

ADD REPLY
5
Entering edit mode

confusing as a developer that has entered bioinformatics

Most of bioinformatics is stringing together various tools built for specific tasks. If you are lucky, the output of one step is compatible with the output of the next step.

ADD REPLY
1
Entering edit mode

If you want to do actual software development then you need a lab/position that focuses on bioinformatics methods development. A large part of the bioinformatics community uses existing methods to uncover novel biological insights.

ADD REPLY
1
Entering edit mode

Do workflows just string together existing tools that other people have created?

In the nutshell, yes, but often enough, custom scripts are required to maintain the fluency. Workflow management systems, on the other hand, should provide significantly more functionalities than just stringing existing tools together.

ADD REPLY
1
Entering edit mode
5.9 years ago

CWL and WDL are specifications for configuration-based pipeline frameworks. These don't allow a lot of inline code, unlike domain specific languages.

ADD COMMENT
0
Entering edit mode

That seems to be the key distinction that i am failing to grasp. So in the workflow step's config it would define a pre-determined tool to be run on the input specified in the process?

ADD REPLY
1
Entering edit mode

That sounds about right.

I'm using the term "configuration" to describe configuration-based (using a specification markup language) rather than convention-based DSLs: https://academic.oup.com/bib/article/18/3/530/2562749

ADD REPLY
0
Entering edit mode
5.9 years ago

You can have inline code in CWL tool descriptions.

However: if it is more than a screenful, or if you need to debug it using normal methods, then you will be better off with an external file.

Here's an example: https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl/blob/886df9de6713e06228d2560c40f451155a196383/tools/discard_short_seqs.cwl#L31

ADD COMMENT

Login before adding your answer.

Traffic: 2717 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6