I have many paired fastq files and I have a problem on after running trim_galore package, as it named the fastq files with _1_val_1 and _2_val_2, for example:
AD50_CTGATCGTA_1_val_1.fq.gz and
AD50_CTGATCGTA_2_val_2.fq.gz.

I would like continue snakemake and use

import os
import snakemake.io
import glob

DIR="AD50"
(SAMPLES,READS,) = glob_wildcards(DIR+"{sample}_{read}.fq.gz")
READS=["1","2"]

rule all:
    input:
        expand(DIR+"{sample}_dedup_{read}.fq.gz",sample=SAMPLES,read=READS)

rule clumpify:
    input:
        r1=DIR+"{sample}_1_val_1.fq.gz",
        r2=DIR+"{sample}_2_val_2.fq.gz"
    output:
        r1out=DIR+"{sample}_dedup_1.fq.gz",
        r2out=DIR+"{sample}_dedup_2.fq.gz"
    shell:
        "clumpify.sh in={input.r1} in2={input.r2} out={output.r1out} out2={output.r2out} dedupe subs=0"

and the error is:

Building DAG of jobs...
MissingInputException in line 13 of /home/peterchung/Desktop/Rerun-Result/clumpify.smk:
Missing input files for rule clumpify:
AD50/AD50_CTGATCGTA_2_val_2_val_2.fq.gz
AD50/AD50_CTGATCGTA_2_val_1_val_1.fq.gz

I tired another way, somehow the closest is that it detected the missing input like
AD50_CTGATCGTA_1_val_2.fq.gz and AD50_CTGATCGTA_2_val_1.fq.gz which is not exist.

I am not sure the glob_wildcards function I used properly since there are many underscore in it. I tired:

 glob_wildcards(DIR+"{sample}_{read}_val_{read}.fq.gz")

but it did not work as well.



Source link