Dear all,
I'm doing variant calling with GATK. My reference genome is a fasta file with 200K scaffolds and I prepared a list called interval.list:
scaffold1
scaffold18
scaffold23
I want to use this interval list as (-L)
wildcards, to accelerate/parallelise the process. How to do this?
rule all:
input:
expand("{int}.vcf", int=interval)
rule gatk:
input:
ref_done="ref.fasta",
bam="bam.list",
interval="interval.list"
output:
outf ="{int}.vcf"
threads: 10
shell:
"""
/Tools/gatk/gatk --java-options "-Xmx16g -XX:ParallelGCThreads=10" HaplotypeCaller -L {wildcards.int} -R {input.ref_done} -I {input.bam} -O {output.outf}
"""
rule merge:
input:
int="{int}.vcf"
output:
outf = "results/res.vcf"
shell:
"java -jar /Tools/picard/build/libs/picard.jar GatherVcfs {wildcards.int} O={output.outf}"