Question

--preserve_environment with docker containers?

0

Entering edit mode

6.6 years ago

Biowoogles ▴ 20

Hi there,

Is --preserve_environment ENVVAR meant to work with Docker containers?

If I print os.environ from within a container launched as a result of a dockerRequirement then I find that my environment variable is not being preserved.

Bug or Working as intended?

Note: I left the same comment over at https://github.com/common-workflow-language/cwltool/issues/196

Thanks!

cwltool Common-Workflow-Language cwl • 1.6k views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 6.6 years ago by Biowoogles ▴ 20

score 0 · Answer 1 · 2017-10-04

Hello Biowoogles,

For (Docker) containers you should use other methods:

1) either set the environment variable as part of the container creation (via Dockerfile or some other method)

or -

2) Set the environment variable in the CommandLineTool, workflow step, or Workflow level using EnvVarRequirement in the requirements or hints section as appropriate: http://www.commonwl.org/user_guide/12-env/

score 0 · Answer 2 · 2017-10-04

0

Entering edit mode

6.5 years ago

Biowoogles ▴ 20

Hi Michael,

Thank you very much for your answer.

Is there any specific reason for why this is the case? Would you potentially accept a PR to enable this?

Kind regards!

ADD COMMENT • link 6.5 years ago by Biowoogles ▴ 20

0

Entering edit mode

One of the goals of using a workflow language like the CWL standards is to be explicit and portable -- and software containers like Docker help a great deal with that.

The reference runner's --preserve_environment flag was added to aid developers and is useful when running in places where software container technology like Docker isn't available and site-specific settings (like PATH and CLASSPATH) need to be set.

For example, see the scripts I wrote for running the EBI Metagenomics pipeline without Docker on their cluster: https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/blob/master/workflows/standalone.sh#L34 & https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/blob/master/workflows/ebi-setup.sh#L27

So if one is using Docker then you don't need this local configuration hack -- your containers or tool should be explicit about setting the needed value(s). Especially as --preserve-environment isn't part of the standard interface for cwl-runners (it is specific to the CWL reference runner, cwltool).

Does that make sense?

ADD REPLY • link 6.5 years ago by Michael R. Crusoe ★ 1.9k

1

Entering edit mode

It does make sense and I wholeheartedly agree on the points of being explicit and portable.

However, my use case was actually to be as a means of tracking the docker containers spawned by a specific workflow, perhaps to perform monitoring or otherwise, so I was hoping to be able to inject an environment variable so that I could distinguish those containers from others which might be alive on the same machine, e.g. a shared use server style environment. This would all be happening on a layer above the workflow.

Hence, I think my use case is actually outside the confines of the philosophical constraints of CWL, and as such at least for me it would be a really nice little feature and would not be part of a broader pattern of mis-use of the framework! However, I understand that this could be a gateway to hackish behaviour, and as such you might not want to include it.

Thanks again for your feedback!

ADD REPLY • link 6.5 years ago by Biowoogles ▴ 20

0

Entering edit mode

What you seek (site specific configuration that doesn't affect the analysis itself) is reasonable, and a feature that one might add to a real workflow management system (which the reference CWL runner is not :-)

Perhaps you could add this feature to Toil or another system with CWL support? http://www.commonwl.org/#Implementations

ADD REPLY • link 6.5 years ago by Michael R. Crusoe ★ 1.9k