Using snakemake with multiple output for calling SNP using clairvoyante
1
0
Entering edit mode
5.4 years ago
Medhat 9.7k

I am trying to run Clairvoyante using snakemake for multiple chromosome. the script looks like this:

chr_list = ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY', 'chrM']

rule clairvoyante:
    """
    Identify SNPs using Clairvoyante.
    """
    input:
        datain="{dataset}/{sample}.{ref}.{align}.{case}.bam"
    output:
        dataout= ["{{dataset}}/{{sample}}.{{ref}}.{{align}}.{{case}}.clarivoyante.{}.vcf".format(i) for i in chr_list]
    params:
        reference=my_ref,
        training=TRAINING_DATA,
        clairvoyante=CLARIVOYANTE,
        minCoverage=1,
    message:
        "Running callvarbam from clairvoyante, sample is: {wildcards.sample}"
    shell:
        """
            clairvoyante.py callVarBam \
            --chkpnt_fn "{params.training}" \
            --bam_fn "{input.datain}" \
            --ref_fn "{params.reference}" \
            --call_fn "{output.dataout}" \
            --minCoverage "{params.minCoverage}" \
            --ctgName 1  \
        """

My issue is the tool calls variant on one chromosome only per time using --ctgName parameter: the easy way is to loop through the chr_list in shell and uses parallel or xargs to call snp on all of them ()I think this is not a convenient way) or maybe write a python function and use map to run them in parallel.

but I think there should be a better way to do that using snakemake to pass the chromosome to --ctgName for each output.

any idea?

SNP snakemake • 4.2k views
ADD COMMENT
1
Entering edit mode
5.4 years ago

FYI: AFAIK the preferred support forum for snakemake is on StackOverflow.

That said, you'll need another function which is going to cat the vcf's together, and which has as input all separate vcf's like:

expand("myvariants_{{sample}}-{chromosome}.vcf", chromosome=chr_list)

You would then adapt your clairvoyante rule to write output in the form of "myvariants_{{sample}}-{chromosome}.vcf".

ADD COMMENT
0
Entering edit mode

Thanks, I will try your answer. Regarding the question I was doubting where to put it, but I thought it may fit here more.

ADD REPLY
0
Entering edit mode

Yeah I agree that it would make sense to use biostars, but see also the support section in the snakemake documentation.

ADD REPLY
0
Entering edit mode

Aha, now I understand, thanks that makes more sense for me now.

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6