What you would like to be added?
A cross-dimensional resource balance scoring plugin that detects and prevents resource fragmentation at scheduling time, rather than fixing it after the fact.
Currently KAI supports bin-packing (fill nodes fully) and spread (distribute evenly), but neither considers the shape of resource usage across dimensions. A node can be 95% on GPU-Memory but 15% on GPU-Compute — bin-packing sees it as "mostly full" and spread sees it as "partially used." Both are wrong. That node has stranded compute.
The proposed plugin scores nodes by measuring how much more balanced they become after placing a pod, using cosine alignment between the pod's request vector and the node's free capacity vector, combined with Shannon entropy reduction.
Why is this needed?
In GPU clusters running mixed inference workloads, resource fragmentation happens across dimensions, not just within them:
- Large model serving (LLaMA 70B, DeepSeek): VRAM full, GPU compute ~20% → compute is stranded
- Many small models (batch inference): GPU compute saturated, VRAM ~30% used → memory is stranded
- Mixed CPU+GPU nodes: CPU 90%, GPU 40% → GPU capacity wasted
This is the "jagged cluster" problem described in #1311, but caught at scheduling time instead of after the fact. Prevention > cure.
With the vectorized resource representation landing (#1353), adding a vector-based scoring plugin becomes natural — the infrastructure is already there.
Proposed approach
Score each node using a 6D resource vector [CPU, Memory, GPU-Compute, GPU-Memory, IOPS, Network]:
alignment = cosine_similarity(pod_request, node_free_capacity)
exhaustion = φ × entropy_reduction(node_before, node_after) + utilization_delta
leak_penalty = 0.15 × count(stranded_dimensions)
score = φ × alignment + exhaustion - leak_penalty + headroom × 0.3
Where φ = 1.618 (golden ratio) provides parameter-free weighting — no per-cluster tuning needed.
Benchmark (synthetic, 50 nodes × 500 pods): 13% improvement in resource balance vs LeastAllocated, zero stranded nodes in 4/5 scenarios. Scoring latency: ~115ns/op, zero allocations.
This is already proposed as a Koordinator scheduler plugin (koordinator-sh/koordinator#2839). Open-source auditor that detects existing imbalance: github.com/0x-auth/lambda-g-auditor
Happy to contribute the implementation if there's interest. The scoring function is stateless and fits naturally into the existing GpuOrderFn plugin pattern.
What you would like to be added?
A cross-dimensional resource balance scoring plugin that detects and prevents resource fragmentation at scheduling time, rather than fixing it after the fact.
Currently KAI supports bin-packing (fill nodes fully) and spread (distribute evenly), but neither considers the shape of resource usage across dimensions. A node can be 95% on GPU-Memory but 15% on GPU-Compute — bin-packing sees it as "mostly full" and spread sees it as "partially used." Both are wrong. That node has stranded compute.
The proposed plugin scores nodes by measuring how much more balanced they become after placing a pod, using cosine alignment between the pod's request vector and the node's free capacity vector, combined with Shannon entropy reduction.
Why is this needed?
In GPU clusters running mixed inference workloads, resource fragmentation happens across dimensions, not just within them:
This is the "jagged cluster" problem described in #1311, but caught at scheduling time instead of after the fact. Prevention > cure.
With the vectorized resource representation landing (#1353), adding a vector-based scoring plugin becomes natural — the infrastructure is already there.
Proposed approach
Score each node using a 6D resource vector
[CPU, Memory, GPU-Compute, GPU-Memory, IOPS, Network]:Where φ = 1.618 (golden ratio) provides parameter-free weighting — no per-cluster tuning needed.
Benchmark (synthetic, 50 nodes × 500 pods): 13% improvement in resource balance vs LeastAllocated, zero stranded nodes in 4/5 scenarios. Scoring latency: ~115ns/op, zero allocations.
This is already proposed as a Koordinator scheduler plugin (koordinator-sh/koordinator#2839). Open-source auditor that detects existing imbalance: github.com/0x-auth/lambda-g-auditor
Happy to contribute the implementation if there's interest. The scoring function is stateless and fits naturally into the existing GpuOrderFn plugin pattern.