Appropriate use of wildcards for single instances in Snakemake
1
0
Entering edit mode
4.9 years ago
camerond ▴ 190

I'm using snakemake to automate Garfield GWAS SNP enrichment analyses over multiple annotations.

My issue is with the variable {CHR} in the snakefile below:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
    expand("garfield_output/{ANN}/garfield.prep.{GWAS}.out", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

rule fastqc:
    input:
        gwas = "garfield-GWAS/{GWAS}/chr{CHR}",
        annot = "garfield-annotations/{ANN}/chr{CHR}",
        pruneTags = "garfield-data/tags/r01/chr{CHR}",
        clumpTags = "garfield-data/tags/r08/chr{CHR}",
        mafTss = "garfield-data/maftssd/chr{CHR}",
    output:
        "garfield_output/{ANN}/garfield.prep.{GWAS}.out"
    shell:
        """
        PTHRESH=1e-5,1e-8
        BINNING=m5,n5,t5
        CONDITION=0
        SUBSET="1-1005"

        /garfiled-v2/garfield-prep-chr -ptags {input.pruneTags} -ctags {input.clumpTags} \
        -maftss {input.mafTss} -pval {input.gwas} -ann {input.annot}
        -excl 895,975,976,977,978,979,980 -chr {CHR} -o {output} || { echo 'Failure!'; }
        """

As {CHR} is not present in the name of the output file, snakemake throws the following error:

Building DAG of jobs... WildcardError in line 9 of /c8000xd3/big-c1477909/garfield/Snakefile: Wildcards in input files cannot be determined from output files: 'CHR'

I need this rule to run a single instance for each chromosome, per GWAS, per annotation but I can't get the correct syntax to produce input files for a single chromosome for each instance. For example, the input for GWAS=adhd, ANN=ATAC, CHR=1 should read:

gwas = "garfield-GWAS/adhd/chr1",
annot = "garfield-annotations/ATAC/chr1",
pruneTags = "garfield-data/tags/r01/chr1",
clumpTags = "garfield-data/tags/r08/chr1",
mafTss = "garfield-data/maftssd/chr1",

I have tried various iterations using the expand and lambda wildcards functions on the input files i.e:

expand("garfield-GWAS/{GWAS}/chr{CHR}", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

OR

lambda wildcards: expand("garfield-GWAS/{GWAS}/chr{CHR}", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))

But these either throw an error or send ALL chromosome files to once instance of the rule rather than individual chromosomes. I can't quite get the correct syntax for this.

Any suggestions on the best way to solve this would be greatly appreciated.

Snakemake wildcards Garfield • 2.2k views
ADD COMMENT
2
Entering edit mode
4.9 years ago

The only way Snakemake can tell which CHR you want in input is if you request it as output, so your wildcard rule needs to produce a CHR-specific output. Any wildcard in the input needs to be in the output, although the reverse isn't necessarily true.

output:
    "garfield_output/{ANN}/chr{CHR}/garfield.prep.{GWAS}.out"

then your target should work

expand("garfield_output/{ANN}/chr{CHR}/garfield.prep.{GWAS}.out", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))
ADD COMMENT
0
Entering edit mode

@Jeremy Leipzig Thanks for the suggestion. Unfortunately this will not work for me as the program produces single output file for all chromosomes. Your suggestion creates an individual output file for each chromosome in a separate folder. I'm wondering of if I can work around this somehow by sending the input files to params ... ?

ADD REPLY
0
Entering edit mode

That shouldn't be happening. Try listing a couple of target files explicitly.

"garfield_output/ATAC/chr1/garfield.prep.adhd.out","garfield_output/ATAC/chr2/garfield.prep.adhd.out"
ADD REPLY

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6