I have a series of rules, such as follows:
checkpoint barcode:
input: get_basecall_input
output:
data = directory(config["results"] + "barcode"),
complete = touch(config["results"] + ".temp/complete/barcode.complete")
params:
guppy_container=config["guppy_container"],
barcode_kit=config["barcode"]["kit"]
shell:
r"""
guppy_barcoder \
--input_path {input} \
--save_path {output.data} \
--barcode_kits {params.barcode_kit} \
--recursive
"""
def get_barcode_input(wildcards):
return glob.glob(config["results"] + f"barcode/{wildcards.barcode}/*.fastq")
rule merge_barcodes:
input: get_barcode_input
output: config["results"] + "barcode/{barcode}.merged.fastq"
params: barcode_folder = config["results"] + "barcode/{barcode}"
shell:
r"""
cat {input} > {output}
"""
def get_merged_barcodes(wildcards):
barcode_output = checkpoints.barcode.get(**wildcards).output[0]
return expand(config["results"] + "barcode/{barcode}.merged.fastq",
barcode=glob_wildcards(os.path.join(barcode_output, "/{barcode}/*.fastq")).barcode)
rule create_classified_unclassified_barcode:
input: get_merged_barcodes
output:
classified = config["results"] + ".temp/barcode.classified.merged.fastq",
unclassified = config["results"] + ".temp/barcode.unclassified.merged.fastq"
shell:
r"""
for file in {input}; do
if [[ "$file" =~ barcode[0-9]{{2}} ]]; then
cat "$file" >> {output.classified}
elif [[ "$file" =~ unclassified ]]; then
cat "$file" >> {output.unclassified}
fi
done
"""
However, I seem to be unable to get the final rule, create_classified_unclassified_barcode
to work properly. I have tried with rule merge_barcodes
, but then create_classified_unclassified_barcode
runs immediately after rule barcode
, the output from rule merge_barcodes
is not taken as input, and nothing is done.
I have also tried using a checkpoint on rule merge_barcodes
, but then I get errors that say Missing wildcard values for barcode
, which makes sense because I am not using wildcards in create_classified_unclassified_barcode
.
I have found this biostars link and this website that show something similar, but they're just different enough that I can't seem to get my own workflow to work. I feel the second link is basically the same exact thing as what I am trying to do. When I implement this (as I have done above), snakemake tries to Updating job 3 (create_classified_unclassified_barcode)
, and then no input is listed for the job once the workflow starts.
I appreciate any help I can get on this problem
I don't see any wildcards used in
create_classified_unclassified_barcode
, so how wouldget_merged_barcodes
get a hold of one?This is part of my issue. I'm trying to merge the output of
merge_barcodes
into two separate files. One to a "classified" output, and another to an "unclassified" output. I have updatedcreate_classified_unclassified_barcode
to show more clearly what I am trying to doSo you can't have a wildcard in the input without one to match it in the output. The input of
create_classified_unclassified_barcode
must be a fixed target - a list of all the merged barcode files.OK, I think I've got something figured out for that, at least for now.
Is there a way for me to use
merge_barcodes
as a checkpoint, and then usecheckpoints.merge_barcodes.get(**wildcards).output[0]
indef get_merged_barcodes
? Or will this not work becausecreate_classified_unclassified_barcode
will then have wildcards in the input, as it does now?