CWL to check the output directory and run for non-existing files
1
0
Entering edit mode
5.4 years ago
a.james ▴ 240

Hello All,

I have a CWL script which should merge the graphs files produced from the previous step. I need t=CWL to check the output directory and merge those graphs. My CWL script looks like following, . The input is an array of BAM files.

  1. I need CWL command line tool to go check the existing output directory and execute step to merge all the already generated files within the output directory. But now it is not doing it rather it is starting from the begin, that is , from the step to generate each graph for each BAM file. Which is processing while time consuming.

    cwlVersion: v1.0 class: CommandLineTool doc: Spladder

    baseCommand: [python2.7, /usr/python/spladder.py]
    
    hints:
      cwltool:InplaceUpdateRequirement:
        inplaceUpdate: true
    requirements:
     - class: InlineJavascriptRequirement
     - class: InitialWorkDirRequirement
       listing: 
        - entry: "$({class: 'Directory', listing: []})"
          entryname: $(inputs.spladder_outDir)
          writable: true
    
    inputs:
     spladder_gtf: 
      type: File
      inputBinding:
       position: 3
       prefix: -a
     spladder_bams: 
      type: File[]
      inputBinding:
       position: 1
       prefix: -b
      secondaryFiles: .bai
     spladder_outDir:
      type: string
      inputBinding:
       position: 2
       prefix: -o
     spladder_phase2:
      type: string
      inputBinding:
       position: 6
       prefix: -T
     spladder_merge_graphs:
      type: string
      inputBinding:
        position: 5
        prefix: -M
     spladder_primary_alignment:
      type: string
      inputBinding:
        position: 10
        prefix: -P
     spladder_confidence:
      type: int
      inputBinding:
        position: 4
        prefix: -c
     spladder_alt:
      type: string
      inputBinding:
        position: 7
        prefix: -t
     spladder_validate:
      type: string
      inputBinding:
        position: 8
        prefix: -V
     spladder_RL:
      type: int
      inputBinding:
        position: 9
        prefix: -n
    
    outputs:
     spladder_out:
      type: Directory
      outputBinding:
       glob: $(inputs.spladder_outDir)/spladder
    
    $namespaces:
      cwltool: http://commonwl.org/cwltool#
    

    And the YML file used for the above script looks like following,

    spladder_gtf: 
     class: File
     path: /usage_examples/gencode.v19.annotation.hs37d5_chr.spladder.gtf
    spladder_outDir:/Alignment/spladder_out/
    spladder_out_dir1: /spladder_out1
    spladder_out_dir2: /spladder_out2
    spladder_bams: [
     {class: File, path: /Alignment/C3N-02289_10_L1Aligned.sortedByCoord.out.bam},
     {class: File, path: /Alignment/C3N-02289_4_5_L1Aligned.sortedByCoord.out.bam},
     {class: File, path: /cluster/work/grlab/projects/alva_temp/Alignment/C3N-02671_08_L1Aligned.sortedByCoord.out.bam}
    ]
    spladder_confidence: 2
    spladder_merge_graphs: merge_graphs
    spladder_alt: alt_3prime
    spladder_RL: 100
    spladder_phase2: y
    spladder_primary_alignment: y
    

And I ran the cal tool as,

 cwltool --enable-ext /spladder_part1.cwl /part2.yml

Now my aim is that the CWL tool looks into spladder_outDir and just merge the existing outputs from the previous run/step. Currently the spladder_outDir has 17 graph files and I need CWL to merge them together. As in the parameter spladder_merge_graphs: But on contrary the CWL is staring from the beginning creating all graphs if no absolute path is given if an absolute is given then it says,

FileExistsError: [Errno 17] File exists: '/spladder_out/spladder'

if not then,

WARNING: Output directory ./spladder_out does not exist - will be created

Any helps or suggestion would be great I read the CWL Manuel end-to end couple of times I saw

cwltool:InplaceUpdateRequirement:
    inplaceUpdate: true

and --enable-ext both of them are providing the right the right solution

If I run it otherwise then the processing time is three times more. That why I wanted to do the merging part as second separate run.

CWL RNA-seq next-gen • 2.5k views
ADD COMMENT
1
Entering edit mode
5.3 years ago
Tom ▴ 540

Hi! If your problem still exists i would very much like to help. However, i am not sure if i understood what your tool is supposed to do. Probably because i don't know anything about spladder. Is it correct that the "previous step" you mentioned is part of a workflow and the Tool you posted here only has the purpose of merging the files?

I am by no means an expert in CWL. That being said, i am not sure InitialWorkdirRequirement can be used in the way you you are attempting for this tool.

You might instead try giving subdirectories of runtime.outdir (the temporary output directory cwl uses during runtime) to spladder as input parameters for its output directory. That way you still know exactly where your files are during runtime, so you can catch the ones you need with glob. This might look like:

[...]
requirements:
 - class: InlineJavascriptRequirement

arguments:
  -  valueFrom: $(runtime.outdir+"/spladder_output")
     prefix: -o
     position: 2

inputs:
[...]
REMOVE spladder_outDir FROM INPUTS
[...]
outputs:
 spladder_out:
  type: Directory
  outputBinding:
   glob: $(runtime.outdir+"/spladder_output")
[...]

I don't know how the output of spladder will look. Let's say its a bunch of ".example"-files, which spladder puts into a subdirectory called "blurb". Then you might alternatively catch the output as an array of files using.

outputs:
  spladder_out:
    type: File[]
    outputBinding:
      glob: $(runtime.outdir+"/spladder_output/blurb/*.example")

Please write if this still produces problems or if i misunderstood the issue altogether. Regards, Tom

ADD COMMENT
1
Entering edit mode

@Tom Thanks for your time and reply. I will take a look into your solution. I tried this solution, but it is not giving out what I need

ADD REPLY

Login before adding your answer.

Traffic: 2587 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6