docs(scheduler): cross-dimensional balance scoring design#1374
docs(scheduler): cross-dimensional balance scoring design#13740x-auth wants to merge 1 commit intokai-scheduler:mainfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
enoodle
left a comment
There was a problem hiding this comment.
I think that we will need to do a long overdue overhaul of the node scoring system in KAI-Scheduler for this to work, but we can design this separately in the mean time.
|
|
||
| ## Summary | ||
|
|
||
| Add a scoring plugin that prevents resource fragmentation by combining post-placement variance minimization with cosine alignment between the pod's resource request vector and the node's free capacity vector. The plugin measures how much *more balanced* a node becomes after placing a pod, then uses directional alignment as a tiebreaker to steer pods toward nodes where their resource shape fills the gap. |
There was a problem hiding this comment.
Why not add to the current nodeplacement plugin another option instead of creating a new plugin?
There was a problem hiding this comment.
Good point — integrating as an additional scoring mode within the existing nodeplacement plugin makes more architectural sense than a standalone plugin. The scoring function is stateless and fits the same interface. I'll restructure the proposal as a new scoring option within nodeplacement rather than a separate plugin.
|
|
||
| ### Why Now | ||
|
|
||
| The vectorized resource representation landing in [#1353](https://github.com/kai-scheduler/KAI-Scheduler/issues/1353) makes a vector-based scoring plugin natural — the infrastructure for multi-dimensional resource vectors is already there. |
There was a problem hiding this comment.
We could have implemented it before.
I don't think the design should be time based, it is just needed as is.
There was a problem hiding this comment.
Agreed — removed the "Why Now" framing. The cross-dimensional balance problem exists independent of any particular PR timeline. The scoring function is useful as-is.
| | Batch small-model inference | ~30% | ~85% | ~70% | ~20% | GPU-Memory, RAM | | ||
| | CPU preprocessing pipeline | ~0% | ~0% | ~90% | ~25% | RAM, all GPU | | ||
|
|
||
| This is the "jagged cluster" problem described in [#1311](https://github.com/kai-scheduler/KAI-Scheduler/issues/1311), but caught at scheduling time rather than repaired after the fact. |
There was a problem hiding this comment.
I am not sure that it is the same as what is described there, or that the problem there can be solved by node scoring at all.
The issue is relevant with only whole GPU allocation - the problem there is that after small jobs stop running "holes" are created.
There was a problem hiding this comment.
You're right — #1311 is about defragmenting after holes are created (descheduler-side), while this proposal prevents fragmentation at scheduling time. They're complementary but different problems. I'll decouple the reference to avoid confusion.
|
|
||
| ### Scope | ||
|
|
||
| This proposal covers CPU, Memory, and GPU-Memory as scoring dimensions. GPU-Compute requests are not currently supported in KAI-Scheduler (as noted by @enoodle in [#1373](https://github.com/kai-scheduler/KAI-Scheduler/issues/1373)), but the scoring function generalizes to any number of dimensions without modification. |
There was a problem hiding this comment.
I am not sure that we have to note it.. why not just support all the dimensions of the resource request vector?
There was a problem hiding this comment.
Makes sense. The scoring function already generalizes to any number of dimensions — it operates on whatever resource vector the node exposes. I limited the scope statement to be conservative, but there's no technical reason not to support the full vector. I'll update the doc to remove the dimension restriction.
| after_frac[i] = (node.used[i] + pod.req[i]) / node.capacity[i] for each active dimension | ||
| mean = average(after_frac) | ||
| variance = average((after_frac[i] - mean)² for each i) | ||
| variance_score = max(0, (1.0 - variance × 4) × 100) |
There was a problem hiding this comment.
why multiply the variance by 4? I see that in the k8s implementation they use the standard deviation instead (https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/noderesources/balanced_allocation.go#L248)
There was a problem hiding this comment.
The × 4 scaling was chosen empirically to normalize variance into a [0, 100] score range. However, switching to standard deviation (as K8s does in BalancedAllocation) would be more consistent with the ecosystem and easier to reason about.
I'll align with the upstream stddev approach for the stability base, ensuring consistency with K8s BalancedAllocation, while retaining the Cosine Alignment component as the high-dimensional tiebreaker to address cross-dimensional fragmentation that stddev alone misses.
Reference: kubernetes/balanced_allocation.go#L248
|
Understood — happy to align this proposal with whatever scoring architecture emerges from the overhaul. The core algorithm (variance base + alignment tiebreaker + penalties) is independent of the plugin framework, so it can adapt to whatever scoring interface KAI moves to. In the meantime I'll update the design doc based on your inline feedback. |
203a72d to
e541145
Compare
Add design proposal for a scoring option within the existing nodeplacement plugin. Combines post-placement stddev minimization with cosine alignment. Address review feedback from @enoodle: - Option within existing nodeplacement plugin (not new plugin) - Uses stddev aligned with upstream K8s BalancedAllocation - Operates on all resource dimensions - Decoupled from kai-scheduler#1311 Ref: kai-scheduler#1373 Signed-off-by: 0x-auth <aa20moon@gmail.com>
e541145 to
282377f
Compare
Description
Design proposal for a cross-dimensional balance scoring plugin (Lambda-G V3) that prevents resource fragmentation by combining post-placement variance minimization (60%) with cosine alignment (20%) as a directional tiebreaker.
Scope: CPU, Memory, and GPU-Memory (GPU-Compute not currently supported in KAI, as noted by @enoodle in #1373). The scoring function generalizes to any number of dimensions without modification.
Key result: V3 wins all 5 heterogeneous scenarios vs BalancedAllocation, DominantResource, LeastAllocated, and MostAllocated. 23% fewer stranded nodes, 53% less wasted capacity.
Design doc as requested in #1373 (comment)
cc @enoodle
Related Issues
Refs #1373
Refs #1311 (cluster defragmentation)
Refs #1353 (vectorized resource representation)
Checklist
Breaking Changes
None. This is a proposal-only PR — no code changes.
Additional Notes