interesting behavior with grep
0
2
Entering edit mode
4.9 years ago
ionox0 ▴ 390

This isn't a question as much as an interesting finding, to get grep to work with the -v flag when the entire file is grepped-out, you need to add || true to make sure the jobs doesn't fail due to a nonzero exit code from grep.

For example this tool concatenates several vcf files together, but will fail withouth the || true if grep removes the only line from a vcf that only has the header (no variants):

cwlVersion: v1.0

class: CommandLineTool

requirements:
  - class: InlineJavascriptRequirement
  - class: ShellCommandRequirement

arguments:
- head
- -n
- '1'
- $(inputs.vcfs[0].path)

- shellQuote: false
  valueFrom: '>'

- all_calls.vcf

- shellQuote: false
  valueFrom: '&&'

- cat
- $(inputs.vcfs)

- shellQuote: false
  valueFrom: '|'

- grep
- -vP
- "^chr1"

# Need this to prevent nonzero exit code if grep runs on header only
- shellQuote: false
  valueFrom: '||'
- 'true'

- shellQuote: false
  valueFrom: '>>'

- all_calls.vcf

inputs:

  vcfs: File[]

outputs:

  concatenated_vcf:
    type: File
    outputBinding:
      glob: all_calls.vcf
cwl • 1.0k views
ADD COMMENT
2
Entering edit mode

From the grep manual page

Exit Status: 0 if a line is selected, 1 if no lines were selected, and 2 if an error occurred

So we can use successCodes: [0, 1] to document that with || true which could hide an error

Also, does

- cat
- $(inputs.vcfs)

really work when vcfs is type: File[]?

ADD REPLY
0
Entering edit mode

Thanks for the tip, I didn't consider using this feature, indeed the successCodes feature solves this problem more cleanly.

The result of the cwl is the following:

$ /bin/sh \
-c \
'head' '-n' '1' '/scratch/tmpeZmeI2/stg3e5bd4d3-f4b3-40c3-be05-689bb0bcd8cf/Sample_1_Annotated_Evidence-annotated.txt' > 'all_calls.txt' && 'cat' '/scratch/tmpeZmeI2/stg3e5bd4d3-f4b3-40c3-be05-689bb0bcd8cf/Sample_1_Annotated_Evidence-annotated.txt' '/scratch/tmpeZmeI2/stg2676fb6f-d8ed-4ced-86e1-8011f675ed83/Sample_2_Annotated_Evidence-annotated.txt' | 'grep' '-vP' '^TumorId' || 'true' >> 'all_calls.txt'

Which looks correct to me in terms of the multiple files being supplied to cat. Is this not recommended?

However I've realized another issue which is that the second command after && is not being redirected to the all_calls.txt file but is rather still being output to stdout. Perhaps I'm misunderstanding the /bin/sh -c usage, but using a subshell for the second command seems to work, although I'm not sure it's recommended:

arguments:
- head
- -n
- '1'
- $(inputs.sv_calls[0].path)

- shellQuote: false
  valueFrom: '>'

- all_calls.txt

- shellQuote: false
  valueFrom: '&&'

# Need to use subshell in order to gather stdout from second command to append to file
- shellQuote: false
  valueFrom: '('

- cat
- $(inputs.sv_calls)

- shellQuote: false
  valueFrom: '|'

- grep
- -vP
- "^TumorId"

# Need this to prevent nonzero exit code if grep runs on header only
- shellQuote: false
  valueFrom: '||'
- 'true'

# Need to use subshell in order to gather stdout from second command to append to file
- shellQuote: false
  valueFrom: ')'

- shellQuote: false
  valueFrom: '>>'

- all_calls.txt
ADD REPLY
1
Entering edit mode

You're in a situation where I would either recommend using a bash script or splitting into multiple CommandLineTools

I take back my comment about $(inputs.vcfs), I was thinking of something else :-)

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6