Question

How do I specify the elements of an array of output files?

0

Entering edit mode

5.9 years ago

biokcb ▴ 170

Hi -

I would like to have a workflow with the following set up:

Step 1: creates an array of N files with specific naming conventions

Step 2: scatters over the output of step 1

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: Workflow

requirements:
 - class: ScatterFeatureRequirement

inputs:
  input_file: File

steps:
  step1:
    run: step1.cwl
    in:
      input_file: input_file
    out: [output_files]
  step2:
    run: step2.cwl
    scatter: input_file
    in:
      input_file: step1/output_files
    out: [output_files] 

outputs: 
  final_out: 
    type: File[]
    outputSource: step2/output_files

Where Step 1 is something like this, where the command is just a shell script that splits the files into 4 independent files, each with specific naming conventions:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool

baseCommand: split_file.sh

inputs:
  input_file:
    type: File
    inputBinding:
      position: 1

outputs:
  junctions:
    type: File[]
    outputBinding:
      glob:
       - $(inputs.input_file.basename).a.tmp
       - $(inputs.input_file.basename).b.txt
       - $(inputs.input_file.basename).c.fastq
       - $(inputs.input_file.basename).d.fasta

I know that I could just glob: "*" to gather all these outputs, but I want to specifically check for the existence each output before moving onto Step 2. When I tried the above, it returned an empty array as output of Step 1, even though the script being called did produce each output in the temp directory. If I use secondaryFiles, it doesn't scatter across them. Is it currently possible to achieve something like this with CWL and what would be the best way? As a note, I cannot currently use ExpressionTool as it isn't supported by the runner we are using just yet.

Thanks!

cwl • 1.4k views

ADD COMMENT • link updated 5.9 years ago by Michael R. Crusoe ★ 1.9k • written 5.9 years ago by biokcb ▴ 170

score 1 · Accepted Answer · 2018-06-08

Hello @biokcb,

I think you have the right approach. Maybe your split_file.sh script isn't outputting to the correct location? glob is checking the current working directory for the specified patterns. If you need to pass a path to your script to tell it where to put its outputs you can use $(runtime.outdir) (no InlineJavascriptRequirement needed)

The following test script works for me:

#!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool

inputs:
  input_file:
    type: File

baseCommand: touch

arguments:
  - $(inputs.input_file.basename).a.tmp
  - $(inputs.input_file.basename).b.txt
  - $(inputs.input_file.basename).c.fastq
  - $(inputs.input_file.basename).d.fasta

outputs:
  junctions:
    type: File[]
    outputBinding:
      glob:
       - $(inputs.input_file.basename).a.tmp
       - $(inputs.input_file.basename).b.txt
       - $(inputs.input_file.basename).c.fastq
       - $(inputs.input_file.basename).d.fasta

Example output

/home/michael/cwltool/env3/bin/cwltool 1.0.20180605140423
Resolved '../biostars_319448.cwl' to 'file:///home/michael/cwltool/biostars_319448.cwl'
[job biostars_319448.cwl] /tmp/tmpkegyt21b$ touch \
    README.rst.a.tmp \
    README.rst.b.txt \
    README.rst.c.fastq \
    README.rst.d.fasta
[job biostars_319448.cwl] completed success
{
    "junctions": [
        {
            "class": "File",
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/t/README.rst.a.tmp",
            "basename": "README.rst.a.tmp",
            "location": "file:///home/michael/cwltool/t/README.rst.a.tmp",
            "size": 0
        },
        {
            "class": "File",
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/t/README.rst.b.txt",
            "basename": "README.rst.b.txt",
            "location": "file:///home/michael/cwltool/t/README.rst.b.txt",
            "size": 0
        },
        {
            "class": "File",
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/t/README.rst.c.fastq",
            "basename": "README.rst.c.fastq",
            "location": "file:///home/michael/cwltool/t/README.rst.c.fastq",
            "size": 0
        },
        {
            "class": "File",
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/t/README.rst.d.fasta",
            "basename": "README.rst.d.fasta",
            "location": "file:///home/michael/cwltool/t/README.rst.d.fasta",
            "size": 0
        }
    ]
}
Final process status is success