I am running an ATAC-seq pipeline that include rules for LD score regression and Partitioned heritability and I need the execution order of these rules to be sequential.

The ldsr rule runs fine, but I'm having issue with the partitioned_heritability rule as I need it to run after all the files in the ldsr rule have been generated.

In normal circumstances, I use the rules.ldsr.output function (labelled #1 in the Snakefile) to inform snakemake about rule execution order, as described here, but this is not possible here as the {chr} wildcard associated with the lsdr output is not included in the output of the partitioned_heritability rule, so the {chr} wildcard cannot be backfilled in the partitioned_heritability rule input . This means I get the error:

Wildcards in input files cannot be determined from output files: 'chr'

So I tried to use a flag file in the output of the ldsr rule and then call this in the input of the partitioned_heritability rule , see here for info on flag files, but this threw this error:

positional argument follows keyword argument

I thought this was probs because I was using named and unnamed inputs in the input/output calls, so also tried applying variable names to the ldsr outputs, i.e. touch_file = touch("mytask.done") and calling rules.ldsr.output.touch_file in the partitioned_heritability input but got the same error.

Also tried using check_call with the flag file, see #3, but that didn't work either. Tbh, I have never used the flag_file or the check_call options before so may not have understood, or be using, them properly.

Here are the files:

Snakefile:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
        expand("LD_annotation_files/Bulk.{chr}.l2.ldscore.gz", chr=range(1,23)),
        expand("partHerit_files2/PrtHrt_Test.{GWASsumstat}", GWASsumstat = config['SUMSTATS'])

rule ldsr:
    input:
        annot = "LD_annotation_files/Bulk.{chr}.annot.gz",
        bfile_folder = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles",
        snps_folder = "ldsc/reference_files/hapmap3_snps"
    output:
        "LD_annotation_files/Bulk.{chr}.l2.ldscore.gz",
#2      touch("mytask.done") # Required to enforce rule execution order i.e. PH after lsdr?
    conda:
        "envs/environment.yml"
    params:
        bfile = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles/1000G.EUR.QC.{chr}",
        ldscores = "LD_annotation_files/ManuBulk.{chr}",
        snps = "ldsc/reference_files/w_hm3.snplist_rsIds"
    log:
        "logs/LDSC/Bulk.{chr}_ldsc.txt"
    shell:
        "export MKL_NUM_THREADS=2;" # Export arguments are  workaround as ldsr uses all available cores
        "export NUMEXPR_NUM_THREADS=2;" # Numbers must match cores parameter in cluster config
        "Reference/ldsc/ldsc.py --l2 --bfile {params.bfile} --ld-wind-cm 1 "
        "--annot {input.annot} --out {params.ldscores} --print-snps {params.snps} 2> {log}"

#3 check_call(["snakemake", "--snakefile", "mytask.done"])

rule partitioned_heritability:
    input:
        weights_folder = "ldsc/reference_files/weights_hm3_no_hla/",
        baseline_folder = "ldsc/reference_files/baselineLD_v2.2_1000G_Phase3/",
        frqfile_folder = "ldsc/reference_files/1000G_Phase3_frq/",
        GWASsumstat = lambda wildcards: config['SUMSTATS'][wildcards.GWASsumstat],
#1        LDSR = rules.ldsr.output
#2        "mytask.done"
    output:
        "partHerit_files2/PrtHrt_Test.{GWASsumstat}"
    conda:
        "envs/environment.yml"
    params:
        weights = "ldsc/reference_files/weights_hm3_no_hla/weights.",
        baseline = "ldsc/reference_files/baselineLD_v2.2_1000G_Phase3/baselineLD.",
        frqfile = "ldsc/reference_files/1000G_Phase3_frq/1000G.EUR.QC.",
        LD_anns = "LD_annotation_files/Bulk."
    log:
        "logs/PrtHerit/Bulk.PrtHrt.{GWASsumstat}.txt"
    shell:
        "python ldsc/ldsc.py --h2 {input.GWASsumstat} --w-ld-chr {params.weights}"
        "--ref-ld-chr {params.baseline},{params.LD_anns} --overlap-annot"
        "--frqfile-chr {params.frqfile} --out {output} --print-coefficients"
        -coefficients"

Config File:

SUMSTATS:
    ADHD: GWAS_sumstats/munged_sumstats/adhd_LDSC.sumstats.gz
    SCZ: GWAS_sumstats/munged_sumstats/scz_LDSC.sumstats.gz
    MDD: GWAS_sumstats/munged_sumstats/mdd_LDSC.sumstats.gz
    ASD: GWAS_sumstats/munged_sumstats/asd_LDSC.sumstats.gz
    BIP: GWAS_sumstats/munged_sumstats/bip_LDSC.sumstats.gz

Any help/advice/solutions with this would be greatly appreciated.



Source link