gravatar for russhh

2 hours ago by

UK, U. Glasgow

Hi. I'm trying to solve a simple map-reduce problem using nextflow.

# imperative pseudocode
# Map
Foreach letter x in (a, b, c): write ${x} to (temporary file) ${x}.txt
# Reduce
Bind together the rows of a.txt, b.txt, c.txt in any order to produce result.txt
# Cleanup
Remove a.txt, b.txt, c.txt and put result.txt into ./results/

I can't find the syntax for reduction in the nextflow documentation:

My (incorrect) reduction code is in the following:

// map_reduce.nf
letters = Channel.from('a', 'b', 'c')                                                               

process map_to_file {
    // I'm fairly certain this does what I want it writes a to a.txt (in whatever random directory
    // nextflow chooses to run that process in), and similarly for b -> b.txt, c -> c.txt
    // This 'process' should run three times, once for each letter
    input:                                                                                          
        val(letter) from letters
    output:                                                                                         
        file "${letter}.txt" into letter_files
    script:
        """                                                                                         
        echo "${letter}" > "${letter}.txt"                                                           
        """                                                                                         
}                                                                                                   

process reduce_files {
    // when we're finished, 'results.txt' in nextflow's working dir should be linked
    // to from results/results.txt                                                      
    publishDir "results"

    // But how do I tell this process that it should run once, and combine all files
    // from the channel `letter_files` into a single output file
    input:                                                                                          
        my_files from letter_files.collect()  // guesswork //                                                      

    output:                                                                                         
        file "results.txt"    

    // wanted: "cat <random_path>/a.txt <other_path>/b.txt <another_path>/c.txt > result.txt"                                                                                        
    script:                                                                                         
        """                                                                                         
        cat ${my_files} > "results.txt"                                                              
        """                                                                                         
}

Hopefuly this makes sense, I'm in the realm of unknown-unknowns with nextflow at the moment



Source link