I understand that this might be difficult to answer without some debugging or trial and error, but I am wondering if some experience, intutition of Java knowledge can help.
I am processing some medium-large unmapped paired end Illumina BAM files (around 50G in size) with the Broad best practice pipeline, but I am using a local cluster 128 cores 528G RAM. I am scheduling tasks using SLURM in such a way that there are no hard limits to CPU or memory.
What sometimes happens is that some Picard metrics tools (which are assigned 7G RAM and unspecified cores) manage to take over 40+ cores or 15+ G RAM (IMAGE). In addition, they seem to take a long time.
Is this known behavior?
I am wondering if I constrain their resources with cgroups, will these tasks fail?
Is it possible that these tasks are somehow performing suboptimally due to their large resource availability?
If needed, I can provide more info. Any help is much appreciated!