Snakemake wildcard usage


I have a set of files that I’d like to perform a function on, with the goal of applying one or more parameters in that function that include more than one possible state.

For example, I might have two samples, each with their own fasta file: sample_A and sample_B.

I want to perform a blast search for each input fasta file, but I also want to loop through a range of word sizes for every blast process for each sample. Say, three values: 11, 13, 15.

This would mean that for the sample_*.fasta input, I’d generate three blast output files, each one reflecting one of those three word size values.

I am struggling to understand how to structure the snakemake rule for input and output names, because my they don’t share the same wildcards - there is an extra name from the blast parameter output that isn’t part of the input name.

Thanks for advice on how to include a parameter name in a snakemake rule into the output name!



Maybe this?

samples = ['sample_a', 'sample_b', 'sample_c']
word_sizes = [11, 13, 15]

rule all:
        expand('blast/{sample}.{word_size}.out', sample= samples, word_size= word_sizes),

rule blast:
        blastn -word_size {wildcards.word_size} -query {input.fa} -out {output.out} ...

Note that the expand() function will create all combinations of sample and word_size and returns a list of strings. If you want more control on what combinations to have you can use any python code to create such list.

Also, this assumes fasta file are named with the sample prefix, if this is not the case you can use a dictionary to map samples to fasta files. In this latter case you may need to use a function as input to the blast rule.

before adding your answer.

Traffic: 2170 users visited in the last hour

Source link