SelectTypeParameters=CR_Core_Memory; DefMemPerCPU#209
SelectTypeParameters=CR_Core_Memory; DefMemPerCPU#209
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement by making memory a consumable resource in Slurm, which will help prevent node oversubscription. The refactoring of nodegroup-specific computations into the ohpc_nodegroup_computed variable is a great step towards better code organization and reusability.
My review includes a few suggestions:
- A high-severity suggestion to make the memory-per-CPU calculation more robust by using
ansible_processor_vcpusand handling cases where this fact might be missing. This will prevent potential playbook failures. - A couple of medium-severity comments to address inconsistent indentation (mixing tabs and spaces) in the Jinja2 template, which will improve code readability and maintainability.
5a27e0d to
3babb40
Compare
|
Re It is awkward to have the I had tried a version with all fields of If one would prefer to use Before, I tried computing the nodegroup and DefMemPerCPU for each host, but it was very awkward. Also the naive approach would be to duplicate the whole block iterating over We could also compute all slurm.conf attributes of each nodegroup in a variable, then just serialize it in slurm.conf.j2 (we can pick some attributes to put first the order of our choosing (NodeList, Features, State), then append the rest... |
|
|
sjpb
left a comment
There was a problem hiding this comment.
I think the PR description should describe what DefMemPerCPU is set to
sjpb
left a comment
There was a problem hiding this comment.
Otherwise looks good once merge conflict fixed
Yes, good point |
Also create the ohpc_nodegroups_computed variable, to share computed values per nodegroup between multiple places of the role. If a nodegroup is in ohpc_nodegroups_computed it has at least one host. It stores - first_host: name of the first host in the nodegroup; use `hostvars[computed.first_host]` to access its hostvars - inventory_group_name: name of the inventory group for this nodegroup; use `groups[computed.inventory_group_name]` to access the host list - ram_mb: memory per node in the nodegroup - def_mem_per_cpu: computed DefMemPerCPU for this nodegroup as the RAM/vCPU ratio.
3babb40 to
da6d88c
Compare
Fixed it to be plural |
With
SelectTypeParameters=CR_Core_Memory, SLURM scheduler will consider memoryas a consumable resource:
a job will be scheduled to a node only if the node has sufficient remaining memory for it.
With default
openhpc_cgroup_config, the job is constrained to not use more:ConstrainRAMSpace=yes)ConstrainSwapSpace=yes)than requested. This is a change from previous
SelectTypeParameters=CR_Core:memory was not taken into account and a job's memory use was only limited
by the total memory and swap on the node at each time.
With
SelectTypeParameters=CR_Core_Memory,DefMemPerCPUmust be defined to prevent a job using a fraction of a node's CPUs from taking all the node's memory. We compute a default value as the minimal memory per thread accross the cluster but it can be changed in the appliance configuration.Also create the
ohpc_nodegroups_computedvariable,to share computed values per nodegroup between multiple places of the role.
If a nodegroup is in ohpc_nodegroups_computed it has at least one host. It stored
hostvars[computed.first_host]to access its hostvars
groups[computed.inventory_group_name]to access the host list