feat(api): add minSubGroup field to PodGroup and SubGroup API#1127
feat(api): add minSubGroup field to PodGroup and SubGroup API#1127omer-dayan wants to merge 8 commits intomainfrom
Conversation
Adds the new MinSubGroup *int32 field to both PodGroupSpec and SubGroup structs, enabling users to specify the minimum number of direct child SubGroups required for elastic gang scheduling. Validation rules enforced via the validating webhook: - minMember and minSubGroup are mutually exclusive at both PodGroup and SubGroup level - minSubGroup must be > 0 and <= number of direct child SubGroups - Leaf SubGroups (no children) may not use minSubGroup - Mid-level SubGroups (has children) may not use minMember DeepCopy functions updated to handle the new pointer field. Refs: #20
📝 WalkthroughWalkthroughThis PR introduces a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CHANGELOG.md (1)
7-11:⚠️ Potential issue | 🟡 MinorChangelog entry should be under [Unreleased] and reference the actual PR number.
The entry is placed under
[v0.13.0]which is already released (2026-03-02), but this PR is still open. Per Keep a Changelog format, unreleased changes should go under the[Unreleased]section. Also,#TBDshould be replaced with#1127.📝 Proposed fix
## [Unreleased] +### Added +- Added `minSubGroup` field to PodGroup and SubGroup API to support specifying the minimum number of child SubGroups required for elastic gang scheduling, along with validation to prevent simultaneous use of `minSubGroup` and `minMember` fields [`#1127`](https://github.com/NVIDIA/KAI-Scheduler/pull/1127) by [KAI Dev Agent](https://github.com/run-ai/KAI-Agents) ## [v0.13.0] - 2026-03-02 ### Added -- Added `minSubGroup` field to PodGroup and SubGroup API to support specifying the minimum number of child SubGroups required for elastic gang scheduling, along with validation to prevent simultaneous use of `minSubGroup` and `minMember` fields (`#TBD`) by [KAI Dev Agent](https://github.com/run-ai/KAI-Agents) - Added `global.nodeSelector` propagation from Helm values to Config CR...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 7 - 11, The changelog entry for the new minSubGroup field is incorrectly placed under [v0.13.0] and uses a placeholder PR number; move the bullet about "Added `minSubGroup` field to PodGroup and SubGroup API..." from the [v0.13.0] section into the [Unreleased] section, replace the `#TBD` token with the real PR number `#1127`, and keep the rest of the text unchanged (including referencing minSubGroup/minMember validation) so the Unreleased section accurately reflects this open PR.
🧹 Nitpick comments (1)
pkg/apis/scheduling/v2alpha2/podgroup_webhook.go (1)
129-158: Consider: mid-level SubGroup with neither minMember nor minSubGroup.The validation allows a mid-level SubGroup (one with children) to have neither
minMembernorminSubGroupset. This may be intentional for backward compatibility, but consider whether this should be flagged as a warning or if documentation should clarify the expected behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/apis/scheduling/v2alpha2/podgroup_webhook.go` around lines 129 - 158, The validateSubGroupMinFields function currently allows a mid-level SubGroup (childrenMap lookup yields children, isLeaf==false) to have neither MinMember nor MinSubGroup set; to enforce explicit intent, modify validateSubGroupMinFields to return an error for non-leaf SubGroup when MinSubGroup is nil (and MinMember must already be prohibited for non-leaf), i.e. after computing isLeaf add a check like: if !isLeaf && subGroup.MinSubGroup == nil { return fmt.Errorf("subgroup %q: mid-level SubGroup must set minSubGroup", subGroup.Name) }; update any callers/tests accordingly and include the SubGroup.Name and clear message referencing MinSubGroup to make the requirement explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@CHANGELOG.md`:
- Around line 7-11: The changelog entry for the new minSubGroup field is
incorrectly placed under [v0.13.0] and uses a placeholder PR number; move the
bullet about "Added `minSubGroup` field to PodGroup and SubGroup API..." from
the [v0.13.0] section into the [Unreleased] section, replace the `#TBD` token
with the real PR number `#1127`, and keep the rest of the text unchanged
(including referencing minSubGroup/minMember validation) so the Unreleased
section accurately reflects this open PR.
---
Nitpick comments:
In `@pkg/apis/scheduling/v2alpha2/podgroup_webhook.go`:
- Around line 129-158: The validateSubGroupMinFields function currently allows a
mid-level SubGroup (childrenMap lookup yields children, isLeaf==false) to have
neither MinMember nor MinSubGroup set; to enforce explicit intent, modify
validateSubGroupMinFields to return an error for non-leaf SubGroup when
MinSubGroup is nil (and MinMember must already be prohibited for non-leaf), i.e.
after computing isLeaf add a check like: if !isLeaf && subGroup.MinSubGroup ==
nil { return fmt.Errorf("subgroup %q: mid-level SubGroup must set minSubGroup",
subGroup.Name) }; update any callers/tests accordingly and include the
SubGroup.Name and clear message referencing MinSubGroup to make the requirement
explicit.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 294c1f15-a0ac-410c-b8f0-b6170a2c224c
📒 Files selected for processing (6)
CHANGELOG.mddeployments/kai-scheduler/crds/scheduling.run.ai_podgroups.yamlpkg/apis/scheduling/v2alpha2/podgroup_types.gopkg/apis/scheduling/v2alpha2/podgroup_webhook.gopkg/apis/scheduling/v2alpha2/podgroup_webhook_test.gopkg/apis/scheduling/v2alpha2/zz_generated.deepcopy.go
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
Summary
Add
minSubGroupfield to PodGroup and SubGroup API to enable hierarchical elastic gang scheduling where only a minimum number of child SubGroups need to be ready.Changes
MinSubGroup *int32to bothPodGroupSpecandSubGroupstructs inpodgroup_types.gopodgroup_webhook.go) with a newvalidatePodGroupSpecfunction that enforces mutual exclusivity betweenminMember/minSubGroup, preventsminSubGroupon leaf SubGroups, preventsminMemberon mid-level SubGroups, and checks thatminSubGroupdoesn't exceed the actual child countzz_generated.deepcopy.goto properly deep-copy the new pointer fieldsTests
TestValidateSubGroupstests preserved and extended with 7 new sub-cases coveringminSubGroupon SubGroupsTestValidatePodGroupSpectest function with 9 cases covering all PodGroup-level validation scenariosArchitecture/API Changes
Added
minSubGroup *int32optional field toPodGroupSpecandSubGroupAPI types — fully backward compatible as the field defaults to nil.Summary by CodeRabbit
Release Notes
minSubGroupfield to PodGroup and SubGroup specifications to specify the minimum number of required child SubGroups for schedulingminSubGroupwithminMemberfields, ensuring mutually exclusive configuration