Question

Suggestions for a simpler solution for collecting snakemake rule output in a text file for use in another rule?

0

Entering edit mode

3 months ago

a.v.vliet • 0

After way too much testing, I designed a pretty inelegant solution that I needed to be able to incorporate a PON into my Mutect2 snakemake pipeline.

The steps are detailed here:

run Mutect2 on all normals
make an .args file that contains the names of all the vcfs you generated in step 1
use that file to create the final panel

Easy enough, right? My problem was with step 2. Because you can only run step 3 by either giving the .args file OR by giving each file separate but preceded by the -vcf option argument, I thought it would be easier to make the args file instead of figuring out how to make snakemake gather and then split out input arguments.

My solution:

rule run_mutect:
    input:
        "input/{sample}.txt"
    output:
        "mutect/{sample}.txt",
    shell:
        # simulate some output value
        "echo {input} > {output}"


rule make_panel:
    input:
       expand("mutect/{normal}.txt", normal = samples["fruits"])
    output:
        "pon.args"
    params:
        sed_regex = "s/ /\\n/g"
    run:
        with open("pon.args", 'a') as f:
             for item in {input}:
                f.write("%s\n" % item)

        shell("sed -i {params.sed_regex:q} pon.args")

If you want to test out a minimal working example, put the above code into a file called pon.smk. Then make a few other files (so it's technically not minimal but I did this to mimic the structure of my pipeline so that I wouldn't miss something obvious related to the design of the entire pipeline):

common.smk:

import pandas as pd

samples = pd.read_table("samples.txt", names = ["fruits"], header = None)

def output_rules():
    output1 = expand("mutect/{sample}.txt", sample = samples["fruits"])
    output2 = "pon.args"
    return output1, output2

Snakefile:

include: "common.smk"
include: "pon.smk"

# a target rule to define the desired final output
rule all:
    input:
        output_rules()

and finally a mini-samples file: echo -e "kiwi\npotato\nstrawberry" >> samples.txt

What I don't like about my solution:

hard to read
doesn't actually work the way I expected it to

I don't understand why the expanded input I gave is not written to the file line by line, so I have to go in after with a sed command to fix that. I tried to follow examples that I found online and had a solution with a checkpoint at one point but I couldn't get it quite right. This is the only thing I got working but I was wondering if others have better solutions.

snakemake • 543 views

ADD COMMENT • link 3 months ago by a.v.vliet • 0

1

Entering edit mode

While I think checkpoint is an elegant solution to many conditional executions, I don't think checkpoint is needed here. Besides, checkpoints can quickly become difficult to read and debug. I think your approach using an expand is great.

Does the following make it easier to read and work the way you expect? It's basically the same approach except I moved samples to configfile.

# usage: snakemake -j1 --configfile samples.yaml
# or define configfile: samples.yaml

rule:
    input: f'{config["exp_name"]}.vcf.gz'

rule run_panel:
    input:
        args = '{exp_name}.args'
    output:
        vcf = touch('{exp_name}.vcf.gz')

rule make_panel:
    input:
        # can also use lambda if custom logic is required
        samples = expand('mutect/{samples}.txt', samples = config['samples'])
    output:
        args = '{exp_name}.args'
    run:
        with open(output.args, 'w') as fout:
            fout.write('\n'.join(input.samples))

rule run_mutect:
    output:
        touch('mutect/{sample}.txt')

samples.yaml:

exp_name: fruits
samples:
  - kiwi
  - potato
  - strawberry

ADD REPLY • link 3 months ago by Eric Lim ★ 2.1k

0

Entering edit mode

Thank you so much for your input. My main misunderstanding came from how Snakemake makes an expanded list of samples-- I thought it was processed as a list. Your code works well for me.

ADD REPLY • link 3 months ago by a.v.vliet • 0

score 1 · Answer 1 · 2024-01-24

1

Entering edit mode

3 months ago

DBScan ▴ 300

As you said, there are two different ways how you can solve it.

If you go the way with -vcf option, you can do something like this in your rule:

params:
vcfs = lambda.w: " ".join("-vcf {}".format, w.sample)

This would append the -vcf to every sample you provide as input.

If you want to create a file with you samples, you could do this:

rule make_panel:
    input:
       expand("mutect/{normal}.txt", normal = samples["fruits"])
    output:
        "pon.args"
    shell:
"""
                echo {input} > {output}
                sed -i "s= =\\n=g" {output}
"""

ADD COMMENT • link 3 months ago by DBScan ▴ 300

0

Entering edit mode

Thank you. This helps me understand how to use lambdas better-- your solution is really nice and I didn't know this was possible.

ADD REPLY • link 3 months ago by a.v.vliet • 0