snakemake wildcard for fastq files
1
0
Entering edit mode
4.9 years ago
arshil • 0

Hi everyone, can anyone help me out setting up the wild card for list of paired end fastq files.(SRR7058331_1.fastq.gz, SRR7058331_2.fastq.gz I am trying to access the files from config.yaml file which looks like

sourcedir: /t6/h7/data/expression
refdir: /AA/Reference_genomes
datadirs:
  fastq: $sourcedir/demo_data
  bam: $sourcedir/bam
  quant: $sourcedir/quant

The code which I am is.
import yaml
configfile: "config.yaml
SAMPLES,=glob_wildcards(config['sourcedir'] + config['datadirs']['fastq']  + "/" +  "{sample}_R1.fastq.gz"))
READS=["1","2"]

its not working. I am pretty new to this.

RNA-Seq snakemake config.yaml • 3.7k views
ADD COMMENT
2
Entering edit mode

you need to follow up on your older questions first. you keep posting variations of the same problem without resolving earlier issues.

ADD REPLY
0
Entering edit mode
import yaml 
configfile: "config.yaml 
SAMPLES,=glob_wildcards(config['sourcedir'] + config['datadirs']['fastq'] + "/" + "{sample}_R1.fastq.gz")
READS=["1","2"]
ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Always edit the original post if you are adding useful information.

Thank you!

ADD REPLY
2
Entering edit mode
4.9 years ago
bari.ballew ▴ 460

It looks like you may have duplicated part of your path. Right now, your path to your data reads: /t6/h7/data/expression/t6/h7/data/expression/demo_data/{sample}_R1.fastq.gz

I'm assuming you need to access the paired fastq files in tandem for alignment or something similar. Try something like this:

import glob
import os

configfile: "config.yaml"
fastqDir = config['datadirs']['fastq'] + '/'

SAMPLES = glob.glob(fastqDir + '*_R1.fastq.gz')  # read in file list
SAMPLES = [os.path.basename(x) for x in SAMPLES]  # remove path from filenames
SAMPLES = [x.replace('_R1.fastq.gz','') for x in SAMPLES]  # isolate sample ID from filename

def get_r1(wildcards):
    return glob.glob(fastqDir + wildcards.sample + '_R1.fastq.gz')

def get_r2(wildcards):
    return glob.glob(fastqDir + wildcards.sample + '_R2.fastq.gz')

rule do_something:
    input: 
        r1 = get_r1,
        r2 = get_r2
...
ADD COMMENT

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6