Question

Appropriate use of wildcards for single instances in Snakemake

0

Entering edit mode

4.9 years ago

camerond ▴ 190

I'm using snakemake to automate Garfield GWAS SNP enrichment analyses over multiple annotations.

My issue is with the variable {CHR} in the snakefile below:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
    expand("garfield_output/{ANN}/garfield.prep.{GWAS}.out", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

rule fastqc:
    input:
        gwas = "garfield-GWAS/{GWAS}/chr{CHR}",
        annot = "garfield-annotations/{ANN}/chr{CHR}",
        pruneTags = "garfield-data/tags/r01/chr{CHR}",
        clumpTags = "garfield-data/tags/r08/chr{CHR}",
        mafTss = "garfield-data/maftssd/chr{CHR}",
    output:
        "garfield_output/{ANN}/garfield.prep.{GWAS}.out"
    shell:
        """
        PTHRESH=1e-5,1e-8
        BINNING=m5,n5,t5
        CONDITION=0
        SUBSET="1-1005"

        /garfiled-v2/garfield-prep-chr -ptags {input.pruneTags} -ctags {input.clumpTags} \
        -maftss {input.mafTss} -pval {input.gwas} -ann {input.annot}
        -excl 895,975,976,977,978,979,980 -chr {CHR} -o {output} || { echo 'Failure!'; }
        """

As {CHR} is not present in the name of the output file, snakemake throws the following error:

Building DAG of jobs... WildcardError in line 9 of /c8000xd3/big-c1477909/garfield/Snakefile: Wildcards in input files cannot be determined from output files: 'CHR'

I need this rule to run a single instance for each chromosome, per GWAS, per annotation but I can't get the correct syntax to produce input files for a single chromosome for each instance. For example, the input for GWAS=adhd, ANN=ATAC, CHR=1 should read:

gwas = "garfield-GWAS/adhd/chr1",
annot = "garfield-annotations/ATAC/chr1",
pruneTags = "garfield-data/tags/r01/chr1",
clumpTags = "garfield-data/tags/r08/chr1",
mafTss = "garfield-data/maftssd/chr1",

I have tried various iterations using the expand and lambda wildcards functions on the input files i.e:

expand("garfield-GWAS/{GWAS}/chr{CHR}", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

OR

lambda wildcards: expand("garfield-GWAS/{GWAS}/chr{CHR}", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

But these either throw an error or send ALL chromosome files to once instance of the rule rather than individual chromosomes. I can't quite get the correct syntax for this.

Any suggestions on the best way to solve this would be greatly appreciated.

Snakemake wildcards Garfield • 2.2k views

ADD COMMENT • link updated 4.9 years ago by Jeremy Leipzig 22k • written 4.9 years ago by camerond ▴ 190

score 2 · Answer 1 · 2019-05-30

2

Entering edit mode

4.9 years ago

Jeremy Leipzig 22k

The only way Snakemake can tell which CHR you want in input is if you request it as output, so your wildcard rule needs to produce a CHR-specific output. Any wildcard in the input needs to be in the output, although the reverse isn't necessarily true.

output:
    "garfield_output/{ANN}/chr{CHR}/garfield.prep.{GWAS}.out"

then your target should work

expand("garfield_output/{ANN}/chr{CHR}/garfield.prep.{GWAS}.out", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

ADD COMMENT • link 4.9 years ago by Jeremy Leipzig 22k

0

Entering edit mode

@Jeremy Leipzig Thanks for the suggestion. Unfortunately this will not work for me as the program produces single output file for all chromosomes. Your suggestion creates an individual output file for each chromosome in a separate folder. I'm wondering of if I can work around this somehow by sending the input files to params ... ?

ADD REPLY • link 4.9 years ago by camerond ▴ 190

0

Entering edit mode

That shouldn't be happening. Try listing a couple of target files explicitly.

"garfield_output/ATAC/chr1/garfield.prep.adhd.out","garfield_output/ATAC/chr2/garfield.prep.adhd.out"

ADD REPLY • link 4.9 years ago by Jeremy Leipzig 22k