Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions doc/sampler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -374,6 +374,7 @@ It makes sense to define a script for this as well:
sbatch script_redis_worker.sh
done


Here, ``n_jobs`` would be the number of jobs submitted. When the job scheduler
is based on qsub, e.g. SGE/UGE, instead use a script like

Expand All @@ -389,6 +390,30 @@ is based on qsub, e.g. SGE/UGE, instead use a script like
and adapt the worker script. For both, there exist many more configuration
options. For further details see the respective documentation.

When submitting a large number of individual SLURM jobs (``n_jobs``), the
scheduler could be overloaded, i.e. increased scheduling overhead may degrade
the overall efficiency of the scheduling respectively performance on the HPC system.

As an alternative, consider to use SLURM job arrays. A SLURM job array is a feature
to manage a collection of similar jobs efficiently using a single submission script.
Each job in the array (task), shares the same job script but can operate on
different inputs and parameters identified by an unique index ``${SLURM_ARRAY_TASK_ID}``.

Furthermore, monitoring and job control is streamlined compared to numerous individual
jobs scattered across the queue (scalability of job submission). SLURM is optimized to handle
large job arrays efficiently and should thus be considered as an alternative to the
submission of many individual, yet related or similar, jobs.


.. code:: bash

sbatch --array=0-99 script_redis_worker script_redis_worker.sh ${SLURM_ARRAY_TASK_ID}

Using ``--array`` one specifies the number of jobs (here ``n_jobs`` is manually set to 99, resulting in 100 tasks)
and note that depending on the variable ``${SLURM_ARRAY_TASK_ID}`` the script ``script_redis_worker.sh`` could
handle for instance different input parameters or input files as identified by a unique index.


Note that when planning for the number of overall redis workers, batches, and
cpus per batch, also the parallelization on the level of the simulations has
to be taken into account. Also, memory requirements should be checked in
Expand Down