Skip to content

Hook in job profiling for slurm backends #150

@bryce-turner

Description

@bryce-turner

We can manually configure a pipeline to use job profiling via slurm. However for newer users a --profile option or backend setting to force job profiling can be helpful. Especially if we automatically generate plots and potentially job efficiency metrics.

For the slurm_singularity backend we generate an sbatch script:

sbatch_script += f"[[ -v SINGULARITY_CACHEDIR ]] || SINGULARITY_CACHEDIR=$HOME/.singularity/cache\n"
sbatch_script += f"if ls $SINGULARITY_CACHEDIR/oci-tmp | grep {singularity_image_digest} > /dev/null ; then\n"
sbatch_script += f" {singularity_run_env_vars}{singularity_executable} exec {singularity_exec_args}{singularity_hostname_arg}{singularity_mounts_string} $SINGULARITY_CACHEDIR/oci-tmp/{singularity_image_digest} bash {cmd_script_filename}\n"
sbatch_script += f"else\n"
sbatch_script += f" {singularity_run_env_vars}{singularity_executable} exec {singularity_exec_args}{singularity_hostname_arg}{singularity_mounts_string} {singularity_image} bash {cmd_script_filename}\n"
sbatch_script += f"fi\n"

Pseudo implementation:

    if profile:
        sbatch_args.extend(['--profile=ltask', '--acctg-freq=task=1'])

    if profile:
        sbatch_script += f"sbatch -n1 -d$SLURM_JOB_ID --wrap="sh5util -j $SLURM_JOB_ID"\n"
    sbatch_script += f"[[ -v SINGULARITY_CACHEDIR ]] || SINGULARITY_CACHEDIR=$HOME/.singularity/cache\n"
    sbatch_script += f"if ls $SINGULARITY_CACHEDIR/oci-tmp | grep {singularity_image_digest} > /dev/null ; then\n"
    sbatch_script += f"  {singularity_run_env_vars}{singularity_executable} exec {singularity_exec_args}{singularity_hostname_arg}{singularity_mounts_string} $SINGULARITY_CACHEDIR/oci-tmp/{singularity_image_digest} bash {cmd_script_filename}\n"
    sbatch_script += f"else\n"
    sbatch_script += f"  {singularity_run_env_vars}{singularity_executable} exec {singularity_exec_args}{singularity_hostname_arg}{singularity_mounts_string} {singularity_image} bash {cmd_script_filename}\n"
    sbatch_script += f"fi\n"

Decisions for if we should generate a plot might be more complex than we expect. Initial thoughts suggest that we should generates plot when the task completes. However if a job fails due to memory or walltime, we would likely want the plot in this case as well.

We may also want to consider psrecord as a general option, should only need to insert the following (assuming psrecord is available already, other alternatives may exist):

sbatch_script += f"  psrecord {singularity_run_env_vars}{singularity_executable} exec ...

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions