Skip to content

docs(scheduler): cross-dimensional balance scoring design#1374

Open
0x-auth wants to merge 1 commit intokai-scheduler:mainfrom
0x-auth:design/cross-dimensional-balance-scoring
Open

docs(scheduler): cross-dimensional balance scoring design#1374
0x-auth wants to merge 1 commit intokai-scheduler:mainfrom
0x-auth:design/cross-dimensional-balance-scoring

Conversation

@0x-auth
Copy link
Copy Markdown

@0x-auth 0x-auth commented Apr 3, 2026

Description

Design proposal for a cross-dimensional balance scoring plugin (Lambda-G V3) that prevents resource fragmentation by combining post-placement variance minimization (60%) with cosine alignment (20%) as a directional tiebreaker.

Scope: CPU, Memory, and GPU-Memory (GPU-Compute not currently supported in KAI, as noted by @enoodle in #1373). The scoring function generalizes to any number of dimensions without modification.

Key result: V3 wins all 5 heterogeneous scenarios vs BalancedAllocation, DominantResource, LeastAllocated, and MostAllocated. 23% fewer stranded nodes, 53% less wasted capacity.

Scenario BalancedAlloc DominantRes Lambda-G V3
Mixed GPU — AI Workload 78.7 79.4 81.8
GPU — Inference Heavy 81.3 76.6 81.9
GPU — Training Heavy 79.4 78.9 82.8
CPU + Few GPUs 72.3 68.8 74.7
Scale (60n×300p) 74.1 73.8 76.7

Design doc as requested in #1373 (comment)

cc @enoodle

Related Issues

Refs #1373
Refs #1311 (cluster defragmentation)
Refs #1353 (vectorized resource representation)

Checklist

  • Self-reviewed
  • Added/updated tests (implementation PR)
  • Updated documentation (design doc added)

Breaking Changes

None. This is a proposal-only PR — no code changes.

Additional Notes

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 3, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 57aca31a-cec1-4699-ab03-10a7ff66fd3c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Collaborator

@enoodle enoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we will need to do a long overdue overhaul of the node scoring system in KAI-Scheduler for this to work, but we can design this separately in the mean time.


## Summary

Add a scoring plugin that prevents resource fragmentation by combining post-placement variance minimization with cosine alignment between the pod's resource request vector and the node's free capacity vector. The plugin measures how much *more balanced* a node becomes after placing a pod, then uses directional alignment as a tiebreaker to steer pods toward nodes where their resource shape fills the gap.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add to the current nodeplacement plugin another option instead of creating a new plugin?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — integrating as an additional scoring mode within the existing nodeplacement plugin makes more architectural sense than a standalone plugin. The scoring function is stateless and fits the same interface. I'll restructure the proposal as a new scoring option within nodeplacement rather than a separate plugin.


### Why Now

The vectorized resource representation landing in [#1353](https://github.com/kai-scheduler/KAI-Scheduler/issues/1353) makes a vector-based scoring plugin natural — the infrastructure for multi-dimensional resource vectors is already there.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have implemented it before.
I don't think the design should be time based, it is just needed as is.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — removed the "Why Now" framing. The cross-dimensional balance problem exists independent of any particular PR timeline. The scoring function is useful as-is.

| Batch small-model inference | ~30% | ~85% | ~70% | ~20% | GPU-Memory, RAM |
| CPU preprocessing pipeline | ~0% | ~0% | ~90% | ~25% | RAM, all GPU |

This is the "jagged cluster" problem described in [#1311](https://github.com/kai-scheduler/KAI-Scheduler/issues/1311), but caught at scheduling time rather than repaired after the fact.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that it is the same as what is described there, or that the problem there can be solved by node scoring at all.
The issue is relevant with only whole GPU allocation - the problem there is that after small jobs stop running "holes" are created.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — #1311 is about defragmenting after holes are created (descheduler-side), while this proposal prevents fragmentation at scheduling time. They're complementary but different problems. I'll decouple the reference to avoid confusion.


### Scope

This proposal covers CPU, Memory, and GPU-Memory as scoring dimensions. GPU-Compute requests are not currently supported in KAI-Scheduler (as noted by @enoodle in [#1373](https://github.com/kai-scheduler/KAI-Scheduler/issues/1373)), but the scoring function generalizes to any number of dimensions without modification.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that we have to note it.. why not just support all the dimensions of the resource request vector?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. The scoring function already generalizes to any number of dimensions — it operates on whatever resource vector the node exposes. I limited the scope statement to be conservative, but there's no technical reason not to support the full vector. I'll update the doc to remove the dimension restriction.

after_frac[i] = (node.used[i] + pod.req[i]) / node.capacity[i] for each active dimension
mean = average(after_frac)
variance = average((after_frac[i] - mean)² for each i)
variance_score = max(0, (1.0 - variance × 4) × 100)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why multiply the variance by 4? I see that in the k8s implementation they use the standard deviation instead (https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/plugins/noderesources/balanced_allocation.go#L248)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The × 4 scaling was chosen empirically to normalize variance into a [0, 100] score range. However, switching to standard deviation (as K8s does in BalancedAllocation) would be more consistent with the ecosystem and easier to reason about.

I'll align with the upstream stddev approach for the stability base, ensuring consistency with K8s BalancedAllocation, while retaining the Cosine Alignment component as the high-dimensional tiebreaker to address cross-dimensional fragmentation that stddev alone misses.

Reference: kubernetes/balanced_allocation.go#L248

@0x-auth
Copy link
Copy Markdown
Author

0x-auth commented Apr 3, 2026

Understood — happy to align this proposal with whatever scoring architecture emerges from the overhaul. The core algorithm (variance base + alignment tiebreaker + penalties) is independent of the plugin framework, so it can adapt to whatever scoring interface KAI moves to. In the meantime I'll update the design doc based on your inline feedback.

@0x-auth 0x-auth force-pushed the design/cross-dimensional-balance-scoring branch from 203a72d to e541145 Compare April 3, 2026 23:08
Add design proposal for a scoring option within the existing nodeplacement
plugin. Combines post-placement stddev minimization with cosine alignment.

Address review feedback from @enoodle:
- Option within existing nodeplacement plugin (not new plugin)
- Uses stddev aligned with upstream K8s BalancedAllocation
- Operates on all resource dimensions
- Decoupled from kai-scheduler#1311

Ref: kai-scheduler#1373

Signed-off-by: 0x-auth <aa20moon@gmail.com>
@0x-auth 0x-auth force-pushed the design/cross-dimensional-balance-scoring branch from e541145 to 282377f Compare April 3, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants