CWL: nameext not returning expected array
1
0
Entering edit mode
5.2 years ago
ForrestBear ▴ 30

I'm writing an expression tool in CWL to handle getting an array of file names based on extension.

In a directory that contains .bed, .vcf.gz, I'd like to return an array of vcf.gz files. It doesn't seem like file.nameext is working as I expect it to.

class: ExpressionTool
cwlVersion: v1.0
inputs:
  vcfsdir: Directory
outputs:
  samples: string[]
  vcfgzs:
    type: File[]
    secondaryFiles: [.tbi]
  beds: File[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var vcfgzs = [];

    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      if (file.nameext == '.gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};

The return output I'm getting is simply:

{
    "vcfgzs": []
}
cwl expressionTool • 1.5k views
ADD COMMENT
0
Entering edit mode

I've confirmed that this bug also exists in the CWL reference runner: https://github.com/common-workflow-language/cwltool/issues/1074

ADD REPLY
0
Entering edit mode
5.1 years ago
Tom ▴ 540

Accessing file.nameext seems to not work in the context of the expression. No idea why. I made a simple workaround that will probably do the trick as long as your filenames only contain one dot.

expression: |
  ${
    var vcfgzs = [];
    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      var filenameext = inputs.vcfsdir.listing[i].basename.split('.')[1];
      if (filenameext == 'gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};
    }
ADD COMMENT
1
Entering edit mode

I did try this and still got empty arrays. I'm wondering if it might be a bug.

ADD REPLY
1
Entering edit mode

It is, thanks for finding and reporting this! https://github.com/common-workflow-language/cwltool/issues/1074

ADD REPLY
0
Entering edit mode

Okay, looks like you were right about it being a bug! I ran the code before posting it here, but did not use subdirectories in the input directory. Sorry! Only the .gz-files in the parent directory get returned it seems. I guess you have to use a CommandLineTool to circumvent the problem for now.

ADD REPLY
0
Entering edit mode

The workaround does work for me with the most recent cwltool release:

Setup:

$ mkdir -p test; touch test/one.gz test/two.gz

biostars_365953-workaround.cwl

class: ExpressionTool
cwlVersion: v1.0
inputs:
  vcfsdir: Directory
outputs:
  samples: string[]
  vcfgzs:
    type: File[]
    secondaryFiles: [.tbi]
  beds: File[]
requirements:
  InlineJavascriptRequirement: {}
expression: |
  ${
    var vcfgzs = [];
    for (var i = 0; i < inputs.vcfsdir.listing.length; i++) {
      var file = inputs.vcfsdir.listing[i];
      var filenameext = inputs.vcfsdir.listing[i].basename.split('.')[1];
      if (filenameext == 'gz') {
        var main = file;
        vcfgzs.push(main);
      }
    }

    return {"vcfgzs": vcfgzs};
    }

result

$ cwltool biostars_365953-workaround.cwl --vcfsdir test
/home/michael/cwltool/env3/bin/cwltool 1.0.20181217162649
Resolved 'biostars_365953-workaround.cwl' to 'file:///home/michael/cwltool/biostars_365953-workaround.cwl'
{
    "vcfgzs": [
        {
            "class": "File",
            "location": "file:///home/michael/cwltool/two.gz",
            "basename": "two.gz",
            "size": 0,
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/two.gz"
        },
        {
            "class": "File",
            "location": "file:///home/michael/cwltool/one.gz",
            "basename": "one.gz",
            "size": 0,
            "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
            "path": "/home/michael/cwltool/one.gz"
        }
    ]
}
Final process status is success
ADD REPLY
0
Entering edit mode

Using cwltool 1.0.20190228155703 and the biostars_365953-workaround.cwl, i still don't get files from subdirectories returned. So the behaviour seems identical to the previous version, at least in case of the workaround.

Test:

$ mkdir indir
$ mkdir indir/subdir
$ touch indir/cat.gz
$ touch indir/subdir/dog.gz
$ cwltool workaround.cwl --vcfsdir indir

Output:

{
        "vcfgzs": [
            {
                "class": "File",
                "location": "file:///mnt/masse/tests/biostars/ForrestBear/cat.gz",
                "basename": "cat.gz",
                "size": 0,
                "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
                "path": "/mnt/masse/tests/biostars/ForrestBear/cat.gz"
            }
        ]
    }
ADD REPLY

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6