proposal(scheduler): cross-dimensional resource balance scoring to prevent GPU fragmentation

### What you would like to be added?

A cross-dimensional resource balance scoring plugin that detects and prevents resource fragmentation at scheduling time, rather than fixing it after the fact.

Currently KAI supports bin-packing (fill nodes fully) and spread (distribute evenly), but neither considers the *shape* of resource usage across dimensions. A node can be 95% on GPU-Memory but 15% on GPU-Compute — bin-packing sees it as "mostly full" and spread sees it as "partially used." Both are wrong. That node has stranded compute.

The proposed plugin scores nodes by measuring how much *more balanced* they become after placing a pod, using cosine alignment between the pod's request vector and the node's free capacity vector, combined with Shannon entropy reduction.

### Why is this needed?

In GPU clusters running mixed inference workloads, resource fragmentation happens across dimensions, not just within them:

- Large model serving (LLaMA 70B, DeepSeek): VRAM full, GPU compute ~20% → compute is stranded
- Many small models (batch inference): GPU compute saturated, VRAM ~30% used → memory is stranded
- Mixed CPU+GPU nodes: CPU 90%, GPU 40% → GPU capacity wasted

This is the "jagged cluster" problem described in #1311, but caught at scheduling time instead of after the fact. Prevention > cure.

With the vectorized resource representation landing (#1353), adding a vector-based scoring plugin becomes natural — the infrastructure is already there.

### Proposed approach

Score each node using a 6D resource vector `[CPU, Memory, GPU-Compute, GPU-Memory, IOPS, Network]`:

```
alignment     = cosine_similarity(pod_request, node_free_capacity)
exhaustion    = φ × entropy_reduction(node_before, node_after) + utilization_delta
leak_penalty  = 0.15 × count(stranded_dimensions)

score = φ × alignment + exhaustion - leak_penalty + headroom × 0.3
```

Where φ = 1.618 (golden ratio) provides parameter-free weighting — no per-cluster tuning needed.

**Benchmark** (synthetic, 50 nodes × 500 pods): 13% improvement in resource balance vs LeastAllocated, zero stranded nodes in 4/5 scenarios. Scoring latency: ~115ns/op, zero allocations.

This is already proposed as a Koordinator scheduler plugin ([koordinator-sh/koordinator#2839](https://github.com/koordinator-sh/koordinator/pull/2839)). Open-source auditor that detects existing imbalance: [github.com/0x-auth/lambda-g-auditor](https://github.com/0x-auth/lambda-g-auditor)

Happy to contribute the implementation if there's interest. The scoring function is stateless and fits naturally into the existing GpuOrderFn plugin pattern.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal(scheduler): cross-dimensional resource balance scoring to prevent GPU fragmentation #1373

What you would like to be added?

Why is this needed?

Proposed approach

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal(scheduler): cross-dimensional resource balance scoring to prevent GPU fragmentation #1373

Description

What you would like to be added?

Why is this needed?

Proposed approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions