Snakemake input error with files from inconsistent naming scheme
1
0
Entering edit mode
3.6 years ago
skbrimer ▴ 740

Hello hive brain,

I am trying to make a workflow in snakemake to process some MinION reads. they are cDNA amplicons of different genotypes of the same virus that were multiplex together. To make sure that I am only using the barcodes I want to be using I am pre-processing the reads with porechop to strictly get reads with both barcode adaptors and then moving forward. However since porechop relabels the reads as BC01, BC02, etc.. I have added a "barcodes section to the config.yaml file but I am having trouble getting pas t this error.

MissingInputException in line 17 of /home/sean/Desktop/reo/antisera project/20200813/MinIONAmplicon.smk:
Missing input files for rule minimap2:
8413_19_strict/BC01.fastq.gz

I know what is is telling me however the rule in my workflow right before is make that directory so I'am not sure why it is not trying to run all the jobs.

Any help is greatly appreciated!

Here is my SnakeFile

configfile: "config.yaml"

rule all:
    input:
        expand("{sample}.bam", sample = config["samples"])

rule porechop_strict:
    input:
        lambda wildcards: config["samples"][wildcards.sample]
    output:
        "{sample}_strict/"
    shell:
        "porechop -i {input} -b {output} --barcode_threshold 85 --threads 8 --require_two_barcodes"

rule minimap2:
    input:
        lambda wildcards: "{sample}_strict/" + config["barcodes"][wildcards.sample]
    output:
        "{sample}.bam"
    shell:
        "minimap2 -ax map-ont -t8 ../concensus.fasta {input} | samtools sort -o {output}"

and my config file

samples: {
  '8413_19': relabeled_reads/8413_19.raw.fastq.gz,
  '8417_19': relabeled_reads/8417_19.raw.fastq.gz,
  '8445_19': relabeled_reads/8445_19.raw.fastq.gz,
  '8466_19_104': relabeled_reads/8466_19_104.raw.fastq.gz,
  '8466_19_105': relabeled_reads/8466_19_105.raw.fastq.gz,
  '8467_20': relabeled_reads/8467_20.raw.fastq.gz,
  }
barcodes: {
      '8413_19': BC01.fastq.gz,
      '8417_19': BC02.fastq.gz,
      '8445_19': BC03.fastq.gz,
      '8466_19_104': BC04.fastq.gz,
      '8466_19_105': BC05.fastq.gz,
      '8467_20': BC06.fastq.gz,
    }
snakemake MinION Nanopore cDNA • 1.2k views
ADD COMMENT
1
Entering edit mode
3.6 years ago
skbrimer ▴ 740

So here is the solution I figured out.

rule minimap2:
    input:
        "{sample}_strict"
    params:
        suffix=lambda wildcards: config["barcodes"][wildcards.sample]
    output:
        "{sample}.bam"
    shell:
        "minimap2 -ax map-ont -t8 ../consensus.fasta\
         {input}/{params.suffix} | samtools sort -o {output}"

I am not sure why it has to runthis way and I am sure it has to do how snakemake figures out what it needs to still create, however I found that I could use the params feature to match the barcode output from porechop and then the input is the same as the output from the previous rule and now it runs as I want.

ADD COMMENT

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6