From 0b6f52b4b0d581489a6c8b5df656dffce1809dba Mon Sep 17 00:00:00 2001 From: Stephan Grein Date: Thu, 26 Jun 2025 12:04:02 +0200 Subject: [PATCH 1/2] Adding SLURM job arrays in documentation. --- doc/sampler.rst | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/doc/sampler.rst b/doc/sampler.rst index 09361f4d..6673b43b 100644 --- a/doc/sampler.rst +++ b/doc/sampler.rst @@ -374,6 +374,7 @@ It makes sense to define a script for this as well: sbatch script_redis_worker.sh done + Here, ``n_jobs`` would be the number of jobs submitted. When the job scheduler is based on qsub, e.g. SGE/UGE, instead use a script like @@ -389,6 +390,30 @@ is based on qsub, e.g. SGE/UGE, instead use a script like and adapt the worker script. For both, there exist many more configuration options. For further details see the respective documentation. +When submitting a large number of individual SLURM jobs (``n_jobs``), the +scheduler could be overloaded, i.e. increased scheduling overhead may degrade +the overall effiency of the scheduling on the HPC system. + +As an alternative, consider to use SLURM job arrays. A SLURM job array is a feature +to manage a collection of similar jobs efficiently using a single submission script. +Each job in the array (task), shares the same job script but can operate on +different inputs and parameters identified by an unique index ``$SLURM_ARRAY_TASK_ID``. + +Furthermore, monitoring and job control is streamlined compared to numerous individual +jobs scattered across the queue (scalability of job submission). SLURM is optimized to handle +large job arrays efficienctly and should be thus considered as an alternative to to the +submission of many individual, yet related or similar jobs. + + +.. code:: bash + + sbatch --array=0-99 script_redis_worker script_redis_worker.sh ${SLURM_ARRAY_TASK_ID} + +Using ``--array`` one specifies the number of jobs (here ``n_jobs`` is manually set to 99, resulting in 100 tasks) +and note that depending on the variable ``${SLURM_ARRAY_TASK_ID}`` the script ``script_redis_worker.sh`` could +handle for instance different input parameters or input files as identified by a unique index. + + Note that when planning for the number of overall redis workers, batches, and cpus per batch, also the parallelization on the level of the simulations has to be taken into account. Also, memory requirements should be checked in From 217f0d7ec423321fd0193a58c370659e59eae702 Mon Sep 17 00:00:00 2001 From: Stephan Grein Date: Thu, 26 Jun 2025 12:11:14 +0200 Subject: [PATCH 2/2] Fix typos. --- doc/sampler.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/sampler.rst b/doc/sampler.rst index 6673b43b..3b725940 100644 --- a/doc/sampler.rst +++ b/doc/sampler.rst @@ -392,17 +392,17 @@ options. For further details see the respective documentation. When submitting a large number of individual SLURM jobs (``n_jobs``), the scheduler could be overloaded, i.e. increased scheduling overhead may degrade -the overall effiency of the scheduling on the HPC system. +the overall efficiency of the scheduling respectively performance on the HPC system. As an alternative, consider to use SLURM job arrays. A SLURM job array is a feature to manage a collection of similar jobs efficiently using a single submission script. Each job in the array (task), shares the same job script but can operate on -different inputs and parameters identified by an unique index ``$SLURM_ARRAY_TASK_ID``. +different inputs and parameters identified by an unique index ``${SLURM_ARRAY_TASK_ID}``. Furthermore, monitoring and job control is streamlined compared to numerous individual jobs scattered across the queue (scalability of job submission). SLURM is optimized to handle -large job arrays efficienctly and should be thus considered as an alternative to to the -submission of many individual, yet related or similar jobs. +large job arrays efficiently and should thus be considered as an alternative to the +submission of many individual, yet related or similar, jobs. .. code:: bash