gravatar for User000

2 hours ago by

Dear all,

I'm doing variant calling with GATK. My reference genome is a fasta file with 200K scaffolds and I prepared a list called interval.list:

scaffold1
scaffold18
scaffold23

I want to use this interval list as (-L) wildcards, to accelerate/parallelise the process. How to do this?

rule all:
    input:
        expand("{int}.vcf", int=interval)
rule gatk:
    input:
        ref_done="ref.fasta",
        bam="bam.list",
       interval="interval.list"
    output:
        outf ="{int}.vcf"
    threads: 10
    shell:
        """
        /Tools/gatk/gatk --java-options "-Xmx16g -XX:ParallelGCThreads=10" HaplotypeCaller -L {wildcards.int} -R {input.ref_done} -I {input.bam} -O {output.outf}
        """
rule merge:
        input:
            int="{int}.vcf"
        output:
            outf = "results/res.vcf"
        shell:
            "java -jar /Tools/picard/build/libs/picard.jar GatherVcfs {wildcards.int} O={output.outf}"

link

modified 12 minutes ago

written
2 hours ago
by

User000400



Source link