Suggestions for a simpler solution for collecting snakemake rule output in a text file for use in another rule?
1
0
Entering edit mode
3 months ago
a.v.vliet • 0

After way too much testing, I designed a pretty inelegant solution that I needed to be able to incorporate a PON into my Mutect2 snakemake pipeline.

The steps are detailed here:

  1. run Mutect2 on all normals
  2. make an .args file that contains the names of all the vcfs you generated in step 1
  3. use that file to create the final panel

Easy enough, right? My problem was with step 2. Because you can only run step 3 by either giving the .args file OR by giving each file separate but preceded by the -vcf option argument, I thought it would be easier to make the args file instead of figuring out how to make snakemake gather and then split out input arguments.

My solution:

rule run_mutect:
    input:
        "input/{sample}.txt"
    output:
        "mutect/{sample}.txt",
    shell:
        # simulate some output value
        "echo {input} > {output}"


rule make_panel:
    input:
       expand("mutect/{normal}.txt", normal = samples["fruits"])
    output:
        "pon.args"
    params:
        sed_regex = "s/ /\\n/g"
    run:
        with open("pon.args", 'a') as f:
             for item in {input}:
                f.write("%s\n" % item)

        shell("sed -i {params.sed_regex:q} pon.args")

If you want to test out a minimal working example, put the above code into a file called pon.smk. Then make a few other files (so it's technically not minimal but I did this to mimic the structure of my pipeline so that I wouldn't miss something obvious related to the design of the entire pipeline):

common.smk:

import pandas as pd

samples = pd.read_table("samples.txt", names = ["fruits"], header = None)

def output_rules():
    output1 = expand("mutect/{sample}.txt", sample = samples["fruits"])
    output2 = "pon.args"
    return output1, output2

Snakefile:

include: "common.smk"
include: "pon.smk"

# a target rule to define the desired final output
rule all:
    input:
        output_rules()

and finally a mini-samples file: echo -e "kiwi\npotato\nstrawberry" >> samples.txt

What I don't like about my solution:

  • hard to read
  • doesn't actually work the way I expected it to

I don't understand why the expanded input I gave is not written to the file line by line, so I have to go in after with a sed command to fix that. I tried to follow examples that I found online and had a solution with a checkpoint at one point but I couldn't get it quite right. This is the only thing I got working but I was wondering if others have better solutions.

snakemake • 543 views
ADD COMMENT
1
Entering edit mode

While I think checkpoint is an elegant solution to many conditional executions, I don't think checkpoint is needed here. Besides, checkpoints can quickly become difficult to read and debug. I think your approach using an expand is great.

Does the following make it easier to read and work the way you expect? It's basically the same approach except I moved samples to configfile.

# usage: snakemake -j1 --configfile samples.yaml
# or define configfile: samples.yaml

rule:
    input: f'{config["exp_name"]}.vcf.gz'

rule run_panel:
    input:
        args = '{exp_name}.args'
    output:
        vcf = touch('{exp_name}.vcf.gz')

rule make_panel:
    input:
        # can also use lambda if custom logic is required
        samples = expand('mutect/{samples}.txt', samples = config['samples'])
    output:
        args = '{exp_name}.args'
    run:
        with open(output.args, 'w') as fout:
            fout.write('\n'.join(input.samples))

rule run_mutect:
    output:
        touch('mutect/{sample}.txt')

samples.yaml:

exp_name: fruits
samples:
  - kiwi
  - potato
  - strawberry
ADD REPLY
0
Entering edit mode

Thank you so much for your input. My main misunderstanding came from how Snakemake makes an expanded list of samples-- I thought it was processed as a list. Your code works well for me.

ADD REPLY
1
Entering edit mode
3 months ago
DBScan ▴ 300

As you said, there are two different ways how you can solve it.

If you go the way with -vcf option, you can do something like this in your rule:

params:
vcfs = lambda.w: " ".join("-vcf {}".format, w.sample)

This would append the -vcf to every sample you provide as input.

If you want to create a file with you samples, you could do this:

rule make_panel:
    input:
       expand("mutect/{normal}.txt", normal = samples["fruits"])
    output:
        "pon.args"
    shell:
"""
                echo {input} > {output}
                sed -i "s= =\\n=g" {output}
"""
ADD COMMENT
0
Entering edit mode

Thank you. This helps me understand how to use lambdas better-- your solution is really nice and I didn't know this was possible.

ADD REPLY

Login before adding your answer.

Traffic: 1762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6