Biostar Test Site

This is site is used for testing only. Visit: https://www.biostars.org to ask a question.

Using wildcards in function in rules
1
0
Entering edit mode
12 days ago
aka ▴ 10

Hi Hello,

I want to do a specific trimming for each sample I have in my pipeline.

I have a list of samples in my config below:

#######################
#      Config data    # 
#######################

samples:
  3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004:
  3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004:
  3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004:
  3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: 
  ....

And another list in my config with the samples and primers as a dictionary for trim5 and trim3.

trim5:
  3373-1_CCGCGGTT-CTAGCGCT-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGCGGTTATCTCGTATGCCGTCTTCTGCTTG 
  3373-2_TTATAACC-TCGATATC-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTATAACCATCTCGTATGCCGTCTTCTGCTTG
  3373-3_GGACTTGG-CGTCTGCG-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACGGACTTGGATCTCGTATGCCGTCTTCTGCTTG
  3373-4_AAGTCCAA-TACTCATA-AHV5HLDSXY_L004: GATCGGAAGAGCACACGTCTGAACTCCAGTCACAAGTCCAAATCTCGTATGCCGTCTTCTGCTTG
 .....

So I made a function that allows to select the primer linked to this sample.

# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim5():
         for trim5 in config['trim5']:
                if {wildcards.sample} == trim5:
                     print (config['trim5'])
                     print (config['trim5'][trim5])
                         return ( config['trim5'][trim5] )
                 else: 
                    continue

# get_index_trim5/3 allow to verify that we use the correct adaptater depending of the correct sequence
def get_index_trim3():
            for trim3 in config['trim3']:
                if {wildcards.sample} == trim3:
                     print (config['trim3'])
                     print (config['trim3'][trim3])
                         return ( config['trim3'][trim3] )
                 else: 
                    continue



rule cutadapt_remove_adaptater_trimm:
    priority: 0
    input:
        reads=["../resources/sequences/{sample}_R1.fastq.gz", "../resources/sequences/{sample}_R2.fastq.gz"]
    output:
        fastq1= "../results/trimmed/{sample}_R1_trimmed.fastq.gz",
        fastq2= "../results/trimmed/{sample}_R2_trimmed.fastq.gz",
        qc= "../results/trimmed/{sample}_qc.txt"
    params:
        adapters="-g %s -a %s -a %s -a %s  -G %s -A %s -A %s -A %s -A %s  "%(get_index_trim5(), config['adaptaters']['adap_R1'], config['adaptaters']['PolyAG'],get_index_trim3(), get_index_trim5(), config['adaptaters']['adap_R2'], get_index_trim3(), config['adaptaters']['PolyG']), 
        extra="--minimum-length 100 -q 20"
    log:
        "../results/logs/trimmed/{sample}_trimmed.log"
    benchmark : 
        "../results/benchmarks/trimmed/{sample}_trimmed_benchmark.txt"
    message:
        """
        --- Trimming on {wildcards.sample} {params.adapters} in process ---
        """
    threads: 4
    resources:
        mem_mb=25000
    shell:
      "cutadapt {params.adapters} {params.extra} -o {output.fastq1} -p {output.fastq2} -j {threads} {input.reads} > {output.qc} 2> {log}"

The problem is I need to check in my function that the wildcards correspond to the primer but I can't pass the wildcards.sample as an argument to my function. I have some difficulties to use wildcards

Can you help me?

Thanks in advance and have a nice day, aka

snakemake rules wildcards • 112 views
ADD COMMENT
0
Entering edit mode
12 days ago
kanika.151 ▴ 120

Hi Aka,

This is a partial snakemake file. Can you also add how you are looking for your samples?

I would add a 'rule all' so that my snakefile knows what it needs to create. For example:

rule all:
      input:
            expand("../results/trimmed/{sample}_R1_trimmed.fastq.gz",config["samples"])
ADD COMMENT
0
Entering edit mode

Hi kanika,

Thank you for your help! Yes it's a partial snakemake file because I use multiple files for my snakemake pipeline.

What do you mean by "how you are looking for your samples?"

I have already a rull all :

#!/bin/python

#include specifics rules files for the pipeline
include: "rules/common.smk"
include: "rules/fastqc_rule.smk"
include: "rules/trimmo_rule.smk"
include: "rules/fastqc_second_rule.smk"


rule all:
    input:  
           ##### First Fastqc
        expand('../results/qc/before_trim/{sample}_{read}.html',sample = config['samples'], read= config['reads']),
        expand('../results/qc/before_trim/{sample}_{read}_fastqc.zip',sample = config['samples'],read= config['reads']),

            #### Trimommatic
        expand('../results/trimmed/{sample}_R1_trimmed.fastq.gz',sample=config['samples']),
        expand('../results/trimmed/{sample}_R2_trimmed.fastq.gz',sample=config['samples']),
        expand('../results/trimmed/{sample}_qc.txt',sample=config['samples']),


             #### Second fastqc after trim (optional)
        expand('../results/qc/after_trim/{sample}_{read}_trimmed.html',sample = config['samples'], read= config['reads'],trim5=config['trim5'],trim3=config['trim3']),
        expand('../results/qc/after_trim/{sample}_{read}_fastqc_trimmed.zip',sample = config['samples'],read= config['reads'],trim5=config['trim5'],trim3=config['trim3']),

I think the problem is that I can't use the wildcards, or the sample name which is trim in my function.

ADD REPLY
0
Entering edit mode

What if you pass your samples in a function like this?
Got the idea from here

for i in SAMPLES:   
 def get_index_trim5(i):
         for trim5 in config['trim5']:
                if {i} == trim5:
                     print (config['trim5'])
                     print (config['trim5'][trim5])
                         return ( config['trim5'][trim5] )
                 else: 
                    continue

I also think {wildcards.sample} only works if you add lambda wildcards: dict[wildcards.sample] .

ADD REPLY
0
Entering edit mode

It's not working like this: Maybe its the SAMPLES I don't define correctly?

SAMPLES, = glob_wildcards(config['samples'])
for i in SAMPLES:   
    def get_index_trim5(i):
            for trim5 in config['trim5']:
                   if {i} == trim5:
                        print (config['trim5'])
                        print (config['trim5'][trim5])
                            return ( config['trim5'][trim5] )
                    else: 
                       continue 

where I add the lambda wildcards: dict[wildcards.sample] ?

ADD REPLY
0
Entering edit mode

I think we have complicated an already complicated situation. LOL.

First, I am sure you cannot use {wildcards.sample} in a function in snakemake. Let me look for a solution and get back to you. If you want to use {wildcards.sample} then you can only use it with lambda function in python. That's what I wanted to say before.

ADD REPLY
0
Entering edit mode

I think too ! XD

I will try to find a solution too but I will wait for you reply also because I am a little lost when it's question of wildcards to be honest.

I will research about lamba too.

Thank you for your help.

ADD REPLY

Login before adding your answer.

Traffic: 270 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6