Biostar Beta. Not for public use.
How to write the output in snakefile (snakemake) for bowtie2-build
2
Entering edit mode
18 months ago
c.clarido • 50
Netherlands/Rotterdam/Leiden University…

Hello community,

I am using snakemake to make a pipeline. I want to add the bowtie2-build from bowtie2 to my current snakefile as follow:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

So I should be expecting the following files: reference.1.bt2 reference.2.bt2 reference.3.bt2 reference.4.bt2 reference.rev.1.bt2 reference.rev.2.bt2

But it seems the problem lies in the output. How can I write the output?

ADD COMMENTlink
5
Entering edit mode
15 months ago
IP • 550
Denmark/University of Copenagen

You wrote this:

rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    output: "output/reference"
    shell: "bowtie2-build {input} {output}"

What snakemake is going to do is to check if the file output ( in this case "output/reference" ) exist after executing the rule. Which doesn't since it is the basename for bowtie

What you can do is to pass the basename for the index as a parameter to snakemake function. Something like the following:

 rule bowtie2Build:
    input: "refgenome/infected_consensus.fasta"
    params:
        basename="output/reference"
   output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"
ADD COMMENTlink
0
Entering edit mode

Hello,

Using the following code:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2"
        output2="output/reference.2.bt2"
        output3="output/reference.3.bt2"
        output4="output/reference.4.bt2"
        outputrev1="output/reference.rev1.bt2"
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

I get :

SyntaxError in line 31 of /home/s1104230/scripts/Snakefile:
Unexpected keyword output1 in rule definition (Snakefile, line 31)
ADD REPLYlink
1
Entering edit mode

Hello,

after each defined output there must be a ,.

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        output1="output/reference.1.bt2",
        output2="output/reference.2.bt2",
        output3="output/reference.3.bt2",
        output4="output/reference.4.bt2",
        outputrev1="output/reference.rev1.bt2",
        outputrev2="output/reference.rev2.bt2"
    shell: "bowtie2-build {input} {params.basename}"

fin swimmer

ADD REPLYlink
0
Entering edit mode

I've updated this in the original (fixed the colon after params too). Good catch.

ADD REPLYlink
0
Entering edit mode

Hello So I tried again the suggested code but it seems that the error occurs at the rev. files:

MissingOutputException in line 26 of /home/s1104230/scripts/Snakefile:
Missing files after 5 seconds:
/home/s1104230/mapping2/reference.rev2.bt2
/home/s1104230/mapping2/reference.rev1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
ADD REPLYlink
0
Entering edit mode

See if those files show up eventually. We usually use --latency-wait 300 do get around this.

ADD REPLYlink
0
Entering edit mode

thanks for the corrections :)

ADD REPLYlink
1
Entering edit mode
15 months ago
Eric Lim ♦ 1.4k
Boston

IP's solution is best as expected outputs are listed explicitly which would take advantage of timestamp.

Since v5.2.0, you can also accomplish this by using directory, which shorten the code by quite a bit while maintaining timestamp, but snakemake won't check if all expected outputs exist.

rule bowtie_build:
  input: 'chrM.fa'
  output: directory('reference/bowtie/')
  shell: 'bowtie2-build {input} {output}'

In this particular example, you'd need to ls -a to see the outputs.

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs

directory has been super helpful in eliminating a bunch of if-else on rules when we don't know what outputs to expect at code time.

ADD COMMENTlink
0
Entering edit mode
16 months ago
WCIP | Glasgow | UK

IP's answer is what I would do. An alternative is to use the touch function to create a marker file that signals that a task has successfully completed:

rule bowtie2Build:
    input: "/home/bnextgen/refgenome/infected_consensus.fasta"
    params:
        basename: "/home/s1104230/output/reference"
    output:
        touch('index.done'),
    shell: "bowtie2-build {input} {params.basename}"

Then, rules that depend on the bowtie index will take in input the file index.done.

ADD COMMENTlink
0
Entering edit mode

Hello, I tried your suggestion But I get the following error:

Error in rule bowtie2Build: jobid: 0 output: index.done

RuleException:
CalledProcessError in line 49 of /home/s1104230/scripts/Snakefile:
Command ' set -euo pipefail;  bowtie2-build  /home/s1104230/mapping2/reference ' returned non-zero exit status 1
  File "/home/s1104230/scripts/Snakefile", line 49, in __rule_bowtie2Build
  File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run
ADD REPLYlink
0
Entering edit mode

The shell command is executed but it is throwing an error. In fact, bowtie2-build should have two positional arguments but there's only one:

bowtie2-build  /home/s1104230/mapping2/reference

It should be something like:

bowtie2-build /home/bnextgen/refgenome/infected_consensus.fasta /home/s1104230/mapping2/reference
ADD REPLYlink
0
Entering edit mode
21 months ago
zorrilla • 0

sort of off topic, but shouldt you also expect to get a .fai file as the output of the bowtie2 build? Can someone clarify this for me?

ADD COMMENTlink
0
Entering edit mode

-A fai file is a index on the fasta file with the format specified here -Bowtie2 output is the Burrows-Wheeler transformation of the reference genome. You can read about the BW transform here.

They are two different indexing techniques, so bowtie2-build does create a fai file because it does not need it for the alignment. It is just searching the BWT

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1