gravatar for quentin54520

2 hours ago by

Hello all,

I need to do alignment of 74 human genome (it's 30x genome). For each i have 8 files because they were sequenced on 4 differents lane and it's paired end data.

To do these 296 alignments i thinks that job array is the best option to parallelize. But i'm not sure how to do that. On the sbatch option i shloud add (to do 10 simultaneous jobs)

#SBATCH --array=1-296%10

But then ? If i add a sample sheet with 3 column with sample name, read1 file, read2 files i coul use this command

r1=`sed -n "$SLURM_ARRAY_TASK_ID"p $samplesheet |  awk '{print $2}'` 
r2=`sed -n "$SLURM_ARRAY_TASK_ID"p $samplesheet |  awk '{print $3}'`

bwa mem -t 10 ref.fa $r1 $r2 | samtools view -bh  [email protected] 4 | samtools sort [email protected] 4 > .bam

For clarity i don't add all option used like read group, or mapping quality filter...

In this exemple i used 18 cpu for 1 alignment. In the sbatch command i need to request 18 and it will put 18 for each jobs (so at the maximum 180) ? It's the same for the memory, In the sbatch i need to request the amount of memory for one jobs or for the 10 simultaneous jobs ?

Thanks in advance and sorry if i'm not clear enough, I am still learning, before starting my thesis I had never done bioinformatics.

Source link