rsync/rclone as a snakemake rule
1
1
Entering edit mode
2.9 years ago
Ram 43k

Hi,

I'm fairly new to snakemake - I've designed my RNAseq pipeline as a Snakefile. It runs STAR and RSEM and a few shell commands.

One of the steps I wish to implement is to upload a generated BAM file to cloud storage once the pipeline is done computing downstream results. I wish to use rclone for this, which is very much like rsync for cloud storage locations. However, rsync/rclone do not produce output files. They simply copy/move a file from source to destination. How can I add a snakemake rule that runs rsync/rclone when there is no "output" that rsync/rclone generate? I don't want to use the random content that a redirection would produce as the "output" parameter - it is too unreliable. This should be a simple solution but maybe I am too close to the problem.

Is there a way I can do this:

rule rsync_copy
    input:
        "{sample}.bam"
    output:
        ??
    shell:
        """
        rsync -avPe ssh "{input}" "user@remote:/bam_files/{wildcards.sample}/"
        """
rclone rsync snakemake • 1.3k views
ADD COMMENT
4
Entering edit mode
2.9 years ago
boxate1618 ▴ 60

probably bad hack would be to touch a file after "rsync_copy.ok", but i think you could end up with a situation where something goes bad and file still gets touched.

have the server confirm completion and send something back?

ADD COMMENT
1
Entering edit mode

This is actually a pretty good solution. I had a discussion offline with a few colleagues and the idea is to add a rsync ... > rsync_copy.ok && mv rsync_copy.ok $LOG_DIR/ and have $LOG_DIR/rsync_copy.ok as the output file.

On successful completion, the exit code is 0. This part is always verified by snakemake (by always running bash in strict mode), so rule failure on rsync/rclone failure is not a problem.

ADD REPLY
1
Entering edit mode

thats actually interesting to know the exit code would break the rule execution. I think touching ok files is fairly common practice to confirm execution of long running processes on clusters. I would abuse this concept though to bend snakemake to execute rules before dynamic output was handled as well as it is now

ADD REPLY
1
Entering edit mode

Your recommendation is actually 100% the official way to go. Snakemake calls these empty touch-files "Flag files". https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#flag-files They even have a directive "touch("flag_file") for this purpose.

ADD REPLY

Login before adding your answer.

Traffic: 1588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6