I'm using snakemake
to automate Garfield GWAS SNP enrichment analyses over multiple annotations.
My issue is with the variable {CHR}
in the snakefile below:
import os
# read config info into this namespace
configfile: "config.yaml"
rule all:
input:
expand("garfield_output/{ANN}/garfield.prep.{GWAS}.out", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))
rule fastqc:
input:
gwas = "garfield-GWAS/{GWAS}/chr{CHR}",
annot = "garfield-annotations/{ANN}/chr{CHR}",
pruneTags = "garfield-data/tags/r01/chr{CHR}",
clumpTags = "garfield-data/tags/r08/chr{CHR}",
mafTss = "garfield-data/maftssd/chr{CHR}",
output:
"garfield_output/{ANN}/garfield.prep.{GWAS}.out"
shell:
"""
PTHRESH=1e-5,1e-8
BINNING=m5,n5,t5
CONDITION=0
SUBSET="1-1005"
/garfiled-v2/garfield-prep-chr -ptags {input.pruneTags} -ctags {input.clumpTags} \
-maftss {input.mafTss} -pval {input.gwas} -ann {input.annot}
-excl 895,975,976,977,978,979,980 -chr {CHR} -o {output} || { echo 'Failure!'; }
"""
As {CHR}
is not present in the name of the output file, snakemake throws the following error:
Building DAG of jobs...
WildcardError in line 9 of /c8000xd3/big-c1477909/garfield/Snakefile:
Wildcards in input files cannot be determined from output files:
'CHR'
I need this rule to run a single instance for each chromosome, per GWAS, per annotation but I can't get the correct syntax to produce input files for a single chromosome for each instance. For example, the input for GWAS=adhd
, ANN=ATAC
, CHR=1
should read:
gwas = "garfield-GWAS/adhd/chr1",
annot = "garfield-annotations/ATAC/chr1",
pruneTags = "garfield-data/tags/r01/chr1",
clumpTags = "garfield-data/tags/r08/chr1",
mafTss = "garfield-data/maftssd/chr1",
I have tried various iterations using the expand
and lambda wildcards
functions on the input files i.e:
expand("garfield-GWAS/{GWAS}/chr{CHR}", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))
OR
lambda wildcards: expand("garfield-GWAS/{GWAS}/chr{CHR}", GWAS=config["GWAS"], ANN=config["ANN"], CHR=range(1,23))
But these either throw an error or send ALL chromosome files to once instance of the rule rather than individual chromosomes. I can't quite get the correct syntax for this.
Any suggestions on the best way to solve this would be greatly appreciated.
@Jeremy Leipzig Thanks for the suggestion. Unfortunately this will not work for me as the program produces single output file for all chromosomes. Your suggestion creates an individual output file for each chromosome in a separate folder. I'm wondering of if I can work around this somehow by sending the input files to
params
... ?That shouldn't be happening. Try listing a couple of target files explicitly.