fix(scheduler): account for fractional GPU usage in spread node placement by enoodle · Pull Request #1099 · NVIDIA/KAI-Scheduler

enoodle · 2026-02-27T22:17:42Z

Description

The spread node placement strategy scored GPU availability using NonAllocatedResource, which only counted whole idle and releasing GPUs. It ignored fractional capacity remaining on shared GPUs, causing nodes with different actual availability to score identically. This led to nondeterministic pod placement instead of even spreading for fractional GPU workloads.

This PR fixes the spread scoring to use GetSumOfIdleGPUs() + GetSumOfReleasingGPUs(), which correctly accounts for partial GPU consumption. A node with 1.5 GPUs available now scores 1.5/2 = 0.75, differentiating it from a fully idle node at 1.0.

The gpupack plugin (GPU-level ordering) continues to handle the "pack onto the same GPU" concern independently — it operates at the GpuOrderFn level, which runs after node selection.

Example: Why the bug causes incorrect spreading

Setup: 2 nodes, each with 2 GPUs. 4 pods requesting 0.5 GPU each. Strategy: spread + gpupack.

Desired outcome: 2 pods per node, each pair packed onto a single GPU → 1 reservation pod per node.

Pod 1 — both nodes empty, both score 2/2 = 1.0. Tie → lands on node-A (GPU-0).

Pod 2 — node-A has 1 idle GPU (NonAllocatedResource = 1), node-B has 2. Scores: 0.5 vs 1.0. node-B wins. ✅

Pod 3 — both nodes have 1 idle GPU each (NonAllocatedResource = 1 for both — the 0.5 used on a shared GPU is invisible). Scores: 0.5 vs 0.5. Tie → lands on either node (say node-A). gpupack packs it onto GPU-0. ✅

Pod 4 — node-A now has 2 pods on GPU-0 (1.0 used), node-B has 1 pod on GPU-0 (0.5 used):

	node-A	node-B
Whole idle GPUs	1 (GPU-1)	1 (GPU-1)
Shared GPU remaining	0 (GPU-0 full)	0.5 (GPU-0 half used)
`NonAllocatedResource`	1	1
spread score	`1/2 = 0.5`	`1/2 = 0.5`

Both nodes score identically. The scheduler doesn't see that node-B has 1.5 GPUs available while node-A only has 1.0. Pod 4 lands on either node nondeterministically. If it goes to node-A → node-A gets 3 pods and opens a second GPU (2 reservation pods instead of 1). ❌

With the fix, node-A scores 1.0/2 = 0.5 and node-B scores 1.5/2 = 0.75, correctly directing pod 4 to node-B. ✅

Adding `gpusharingorder` doesn't help — it makes things worse

One might try enabling gpusharingorder to break the tie by preferring nodes with existing shared GPU groups. But this backfires on pod 2:

Plugin	node-A	node-B
spread	`1/2 = 0.5`	`2/2 = 1.0`
gpusharingorder	`+1000` (GPU-0 has a shared group, pod fits)	`0` (no shared GPUs)
Total	1000.5	1.0

Pod 2 goes to node-A instead of node-B — the +1000 score overwhelms the spread signal entirely, turning "spread across nodes" into "pack onto nodes that already have shared GPUs."

Changes

spread.go: For GPU resources, compute nonAllocated as the sum of idle + releasing GPU fractions instead of using NonAllocatedResource.
nodespread_test.go: Unit test verifying that a node with shared GPU consumption scores lower than a fully idle node.
allocateFractionalGpu_test.go: Integration test reproducing the E2E scenario — 4 pods (0.5 GPU each) across 2 nodes (2 GPUs each) with spread + gpupack.
fill_node_test.go (new): E2E test that submits 2×numNodes fractional GPU pods under spread strategy and asserts exactly 1 reservation pod per GPU node.
plugins.go (new): E2E helper to enable/disable scheduler plugins via SchedulingShard patches.
node_order_suite_test.go: Register the new E2E test suite.

Related Issues

Fixes #

Checklist

Self-reviewed
Added/updated tests (if needed)
Updated documentation (if needed)

Breaking Changes

None.

Additional Notes

The gpusharingorder plugin is intentionally not enabled in the E2E test. It adds a +1000 node-level score to nodes with existing shared GPU groups, which actively fights the spread strategy. GPU packing within a node is handled entirely by gpupack at the GPU-order level.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed spread node placement strategy to properly handle fractional GPU usage on shared GPUs, eliminating nondeterministic pod placement and ensuring even spreading across nodes.
Tests
- Added comprehensive tests validating spread strategy behavior with fractional GPU allocations and shared GPU scenarios.

…ment The spread strategy used NonAllocatedResource for GPU scoring, which only counted whole idle GPUs and ignored fractional capacity on shared GPUs. This caused nodes with different actual availability to score identically, leading to nondeterministic placement instead of even spreading. Use GetSumOfIdleGPUs + GetSumOfReleasingGPUs to correctly reflect partial GPU consumption in spread scores.

coderabbitai · 2026-02-27T22:17:52Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR fixes the spread node placement strategy to correctly handle fractional GPU usage on shared GPUs. The core fix modifies the scoring logic to compute non-allocated resources differently for GPU versus non-GPU resources, with early returns to prevent division by zero. Changes are accompanied by unit tests and end-to-end tests for validation.

Changes

Cohort / File(s)	Summary
Documentation `CHANGELOG.md`	Updated changelog to document the fix for spread node placement strategy handling fractional GPU usage on shared GPUs.
Core Scoring Logic `pkg/scheduler/plugins/nodeplacement/spread.go`	Modified score computation to add early returns when resource count is zero and adjusted non-allocated resource calculation based on resource type (GPU vs. non-GPU).
Unit Tests `pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go`, `pkg/scheduler/plugins/nodeplacement/nodespread_test.go`	Added test cases validating spread strategy with GPU packing and shared GPU consumption scoring behavior.
E2E Test Infrastructure `test/e2e/modules/configurations/feature_flags/plugins.go`	Added helper function to enable/disable plugins on scheduler shards during testing.
E2E Test Suite `test/e2e/suites/allocate/node_order/fill_node_test.go`, `test/e2e/suites/allocate/node_order/node_order_suite_test.go`	Added end-to-end test suite for fill node with fractional GPUs, including helper to identify GPU nodes and test registration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Poem

🐰 Hops and spreads, a careful dance,
Fractional GPUs get their chance,
No more bouncing, nodes align,
Shared memory sharing—oh so fine!
Tests confirm the fix is right,
Scheduling now works just right! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(scheduler): account for fractional GPU usage in spread node placement' directly and clearly describes the main fix: the spread node placement strategy now properly accounts for fractional GPU usage.
Description check	✅ Passed	The PR description is comprehensive and complete. It includes a detailed explanation of the bug, the fix, examples demonstrating the issue, related code changes, and confirms self-review and test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch erez/gpupack-not-packing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

enoodle · 2026-02-27T22:22:58Z

@coderabbitai review

coderabbitai · 2026-02-27T22:23:07Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

test/e2e/modules/configurations/feature_flags/plugins.go (1)

7-17: Reorder imports into std / external / internal groups.

The import block currently mixes internal packages with external dependencies (k8s.io/utils/ptr) in the same group.

Proposed import grouping

 import (
 	"context"
 
+	"k8s.io/utils/ptr"
+
 	kaiv1 "github.com/NVIDIA/KAI-scheduler/pkg/apis/kai/v1"
 	"github.com/NVIDIA/KAI-scheduler/pkg/common/constants"
 	"github.com/NVIDIA/KAI-scheduler/test/e2e/modules/configurations"
 	"github.com/NVIDIA/KAI-scheduler/test/e2e/modules/constant"
 	testContext "github.com/NVIDIA/KAI-scheduler/test/e2e/modules/context"
 	"github.com/NVIDIA/KAI-scheduler/test/e2e/modules/wait"
-	"k8s.io/utils/ptr"
 )

As per coding guidelines: **/*.go: Organize imports in three groups separated by blank lines: (1) Standard library, (2) External dependencies, (3) Internal packages.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/modules/configurations/feature_flags/plugins.go` around lines 7 -
17, Reorder the import block so imports are split into three groups (std lib,
external, internal) separated by blank lines: move "context" into the standard
library group; keep "k8s.io/utils/ptr" and any other non‑repo packages (if
added) in the external group; and place internal packages like
"github.com/NVIDIA/KAI-scheduler/pkg/..." and
"github.com/NVIDIA/KAI-scheduler/test/..." into the internal group. Update the
import block containing kaiv1, constants, configurations, constant, testContext,
wait and k8s.io/utils/ptr accordingly so imports follow the
std/external/internal ordering.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go`:
- Around line 1666-1687: The current check only verifies per-node GPU-group
count (reservationsByNode) and misses asserting spread across nodes; update the
test around the reservationsByNode computation to also assert the total number
of distinct nodes used equals the expected count (e.g., 2) and that each touched
node has the expected number of pods (e.g., 2), using
ssn.ClusterInfo.PodGroupInfos and task.GPUGroups to compute per-node pod counts
and failing with t.Errorf including testNumber and
testMetadata.TestTopologyBasic.Name when the node count or per-node pod counts
differ from expectations.

In `@test/e2e/suites/allocate/node_order/fill_node_test.go`:
- Around line 58-67: The AfterAll block should restore whatever plugin and
placement settings were present before the suite ran instead of forcing
gpupack=false, gpuspread=false and DefaultPluginName; capture the initial values
at setup (e.g., in BeforeAll) by calling feature_flags.GetPluginEnabled or
equivalent for "gpupack" and "gpuspread" and feature_flags.GetPlacementStrategy
(store into variables like originalGpupackEnabled, originalGpuspreadEnabled,
originalPlacementStrategy) and then in AfterAll call
feature_flags.SetPluginEnabled(ctx, testCtx, "gpupack", originalGpupackEnabled),
feature_flags.SetPluginEnabled(ctx, testCtx, "gpuspread",
originalGpuspreadEnabled) and feature_flags.SetPlacementStrategy(ctx, testCtx,
originalPlacementStrategy), preserving the current error handling (Fail on
error) and referencing the existing symbols AfterAll,
feature_flags.SetPluginEnabled, feature_flags.SetPlacementStrategy,
DefaultPluginName, and testCtx.
- Around line 93-109: The test currently counts all pods in
constant.KaiReservationNamespace which can include unrelated reservations;
restrict to only pods created by this test run by adding a selector/filter: when
listing pods via testCtx.KubeClientset.CoreV1().Pods(...).List use a
metav1.ListOptions{LabelSelector: "<key>=<test-id>"} (or if a label isn’t
available, filter reservationPods.Items in-code by a unique test
label/annotation or by creationTimestamp >= the test start time stored on
testCtx). Update the reservationsByNode counting logic to only consider pods
that match that label/annotation or time, leaving the rest of the assertions
(reservationsByNode and loop over gpuNodes) unchanged.

---

Nitpick comments:
In `@test/e2e/modules/configurations/feature_flags/plugins.go`:
- Around line 7-17: Reorder the import block so imports are split into three
groups (std lib, external, internal) separated by blank lines: move "context"
into the standard library group; keep "k8s.io/utils/ptr" and any other non‑repo
packages (if added) in the external group; and place internal packages like
"github.com/NVIDIA/KAI-scheduler/pkg/..." and
"github.com/NVIDIA/KAI-scheduler/test/..." into the internal group. Update the
import block containing kaiv1, constants, configurations, constant, testContext,
wait and k8s.io/utils/ptr accordingly so imports follow the
std/external/internal ordering.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95451bd and e165fc6.

📒 Files selected for processing (7)

CHANGELOG.md
pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go
pkg/scheduler/plugins/nodeplacement/nodespread_test.go
pkg/scheduler/plugins/nodeplacement/spread.go
test/e2e/modules/configurations/feature_flags/plugins.go
test/e2e/suites/allocate/node_order/fill_node_test.go
test/e2e/suites/allocate/node_order/node_order_suite_test.go

pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go

test/e2e/suites/allocate/node_order/fill_node_test.go

github-actions · 2026-02-27T22:47:05Z

📊 Performance Benchmark Results

Comparing PR (erez/gpupack-not-packing) vs main branch:

main-bench.txt:153: parsing iteration count: invalid syntax
pr-bench.txt:153: parsing iteration count: invalid syntax
goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
                                    │ main-bench.txt │            pr-bench.txt            │
                                    │     sec/op     │    sec/op     vs base              │
AllocateAction_SmallCluster-4            108.3m ± 1%   108.1m ±  0%       ~ (p=0.065 n=6)
AllocateAction_MediumCluster-4           138.6m ± 1%   136.0m ±  1%  -1.87% (p=0.002 n=6)
AllocateAction_LargeCluster-4            224.6m ± 8%   222.5m ± 19%       ~ (p=0.699 n=6)
ReclaimAction_SmallCluster-4             102.9m ± 0%   102.8m ±  0%       ~ (p=0.065 n=6)
ReclaimAction_MediumCluster-4            105.8m ± 0%   105.6m ±  0%       ~ (p=0.065 n=6)
PreemptAction_SmallCluster-4             103.7m ± 0%   103.7m ±  0%       ~ (p=0.180 n=6)
PreemptAction_MediumCluster-4            114.0m ± 0%   114.0m ±  1%       ~ (p=0.485 n=6)
ConsolidationAction_SmallCluster-4       114.6m ± 0%   114.6m ±  1%       ~ (p=0.937 n=6)
ConsolidationAction_MediumCluster-4      209.6m ± 1%   209.8m ±  2%       ~ (p=1.000 n=6)
FullSchedulingCycle_SmallCluster-4       105.6m ± 0%   105.4m ±  0%       ~ (p=0.132 n=6)
FullSchedulingCycle_MediumCluster-4      120.4m ± 1%   120.1m ±  1%       ~ (p=0.180 n=6)
FullSchedulingCycle_LargeCluster-4       161.0m ± 1%   160.7m ±  1%       ~ (p=0.485 n=6)
ManyQueues_MediumCluster-4               142.4m ± 0%   142.2m ±  1%       ~ (p=0.485 n=6)
GangScheduling_MediumCluster-4           159.8m ± 2%   160.7m ±  1%       ~ (p=0.818 n=6)
geomean                                  131.9m        131.7m        -0.22%

                                    │ main-bench.txt │            pr-bench.txt            │
                                    │      B/op      │     B/op      vs base              │
AllocateAction_SmallCluster-4           2.295Mi ± 0%   2.296Mi ± 0%       ~ (p=0.240 n=6)
AllocateAction_MediumCluster-4          12.42Mi ± 0%   12.42Mi ± 0%       ~ (p=1.000 n=6)
AllocateAction_LargeCluster-4           42.97Mi ± 0%   42.97Mi ± 0%       ~ (p=0.818 n=6)
ReclaimAction_SmallCluster-4            971.1Ki ± 1%   973.1Ki ± 1%       ~ (p=0.699 n=6)
ReclaimAction_MediumCluster-4           3.156Mi ± 0%   3.159Mi ± 0%       ~ (p=0.394 n=6)
PreemptAction_SmallCluster-4            1.066Mi ± 1%   1.065Mi ± 1%       ~ (p=0.937 n=6)
PreemptAction_MediumCluster-4           4.263Mi ± 0%   4.263Mi ± 0%  -0.01% (p=0.041 n=6)
ConsolidationAction_SmallCluster-4      5.833Mi ± 0%   5.837Mi ± 0%       ~ (p=0.132 n=6)
ConsolidationAction_MediumCluster-4     48.16Mi ± 0%   48.15Mi ± 0%       ~ (p=0.394 n=6)
FullSchedulingCycle_SmallCluster-4      1.485Mi ± 0%   1.485Mi ± 0%       ~ (p=0.485 n=6)
FullSchedulingCycle_MediumCluster-4     7.288Mi ± 0%   7.287Mi ± 0%       ~ (p=0.240 n=6)
FullSchedulingCycle_LargeCluster-4      23.94Mi ± 0%   23.94Mi ± 0%       ~ (p=0.699 n=6)
ManyQueues_MediumCluster-4              16.88Mi ± 0%   16.88Mi ± 0%       ~ (p=0.485 n=6)
GangScheduling_MediumCluster-4          18.17Mi ± 0%   18.17Mi ± 0%       ~ (p=0.937 n=6)
geomean                                 6.706Mi        6.707Mi       +0.02%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            37.07k ± 0%   37.07k ± 0%       ~ (p=0.396 n=6)
AllocateAction_MediumCluster-4           328.6k ± 0%   328.6k ± 0%       ~ (p=0.452 n=6)
AllocateAction_LargeCluster-4            1.403M ± 0%   1.403M ± 0%       ~ (p=0.299 n=6)
ReclaimAction_SmallCluster-4             8.914k ± 0%   8.915k ± 0%       ~ (p=0.881 n=6)
ReclaimAction_MediumCluster-4            28.57k ± 0%   28.57k ± 0%       ~ (p=0.316 n=6)
PreemptAction_SmallCluster-4             11.60k ± 0%   11.60k ± 0%       ~ (p=0.649 n=6)
PreemptAction_MediumCluster-4            40.37k ± 0%   40.37k ± 0%       ~ (p=0.141 n=6)
ConsolidationAction_SmallCluster-4       74.93k ± 0%   74.96k ± 0%       ~ (p=0.093 n=6)
ConsolidationAction_MediumCluster-4      693.1k ± 0%   693.1k ± 0%       ~ (p=1.000 n=6)
FullSchedulingCycle_SmallCluster-4       22.05k ± 0%   22.05k ± 0%       ~ (p=0.381 n=6)
FullSchedulingCycle_MediumCluster-4      177.4k ± 0%   177.4k ± 0%       ~ (p=0.249 n=6)
FullSchedulingCycle_LargeCluster-4       733.9k ± 0%   733.9k ± 0%       ~ (p=0.818 n=6)
ManyQueues_MediumCluster-4               366.7k ± 0%   366.7k ± 0%       ~ (p=0.459 n=6)
GangScheduling_MediumCluster-4           603.0k ± 0%   603.0k ± 0%       ~ (p=0.794 n=6)
geomean                                  114.6k        114.6k       +0.00%

pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/integration_tests/reclaim
                            │ main-bench.txt │            pr-bench.txt             │
                            │     sec/op     │    sec/op      vs base              │
ReclaimLargeJobs_10Node-4      104.9m ± 0%     104.9m ± 0%         ~ (p=0.589 n=6)
ReclaimLargeJobs_50Node-4      142.7m ± 1%     142.8m ± 1%         ~ (p=0.937 n=6)
ReclaimLargeJobs_100Node-4     282.2m ± 1%     283.8m ± 1%         ~ (p=0.065 n=6)
ReclaimLargeJobs_200Node-4      1.134 ± 5%      1.166 ± 4%         ~ (p=0.485 n=6)
ReclaimLargeJobs_500Node-4      13.19 ± 1%      13.16 ± 1%         ~ (p=0.589 n=6)
ReclaimLargeJobs_1000Node-4     113.0 ±  ∞ ¹    113.1 ±  ∞ ¹       ~ (p=0.886 n=4)
geomean                         1.388           1.395         +0.56%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ main-bench.txt │             pr-bench.txt             │
                            │      B/op      │      B/op       vs base              │
ReclaimLargeJobs_10Node-4     1.887Mi ± 2%     1.880Mi ± 3%         ~ (p=0.818 n=6)
ReclaimLargeJobs_50Node-4     17.52Mi ± 0%     17.52Mi ± 0%         ~ (p=1.000 n=6)
ReclaimLargeJobs_100Node-4    59.60Mi ± 0%     59.60Mi ± 0%         ~ (p=1.000 n=6)
ReclaimLargeJobs_200Node-4    235.5Mi ± 0%     235.5Mi ± 0%         ~ (p=0.699 n=6)
ReclaimLargeJobs_500Node-4    1.704Gi ± 0%     1.704Gi ± 0%         ~ (p=0.818 n=6)
ReclaimLargeJobs_1000Node-4   8.957Gi ±  ∞ ¹   8.959Gi ±  ∞ ¹       ~ (p=0.886 n=4)
geomean                       139.7Mi          139.6Mi         -0.06%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ main-bench.txt │            pr-bench.txt             │
                            │   allocs/op    │   allocs/op    vs base              │
ReclaimLargeJobs_10Node-4      20.23k ± 3%     20.23k ± 3%         ~ (p=0.913 n=6)
ReclaimLargeJobs_50Node-4      232.9k ± 0%     232.9k ± 0%         ~ (p=0.673 n=6)
ReclaimLargeJobs_100Node-4     866.7k ± 0%     866.7k ± 0%         ~ (p=0.732 n=6)
ReclaimLargeJobs_200Node-4     3.659M ± 0%     3.659M ± 0%         ~ (p=0.240 n=6)
ReclaimLargeJobs_500Node-4     29.28M ± 0%     29.28M ± 0%         ~ (p=0.937 n=6)
ReclaimLargeJobs_1000Node-4    163.0M ±  ∞ ¹   163.0M ±  ∞ ¹       ~ (p=0.686 n=4)
geomean                        2.036M          2.036M         -0.00%
¹ need >= 6 samples for confidence interval at level 0.95

Legend

📉 Negative delta = Performance improvement (faster)
📈 Positive delta = Performance regression (slower)
p-value < 0.05 indicates statistically significant change

Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 108387225 ns/op	 2407843 B/op	   37072 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108106821 ns/op	 2409948 B/op	   37070 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107960886 ns/op	 2406764 B/op	   37070 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107963258 ns/op	 2407421 B/op	   37069 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108084008 ns/op	 2407768 B/op	   37071 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108286001 ns/op	 2408611 B/op	   37071 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136182995 ns/op	13021290 B/op	  328597 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134672787 ns/op	13020943 B/op	  328593 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136606786 ns/op	13021495 B/op	  328591 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135822670 ns/op	13021932 B/op	  328598 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135763606 ns/op	13025284 B/op	  328599 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 136702610 ns/op	13022330 B/op	  328599 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 232208448 ns/op	45051067 B/op	 1402690 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 210458975 ns/op	45056004 B/op	 1402675 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 222361500 ns/op	45057248 B/op	 1402686 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 214208028 ns/op	45051884 B/op	 1402700 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 264802710 ns/op	45050092 B/op	 1402688 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 222716076 ns/op	45072732 B/op	 1402688 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102767508 ns/op	  987504 B/op	    8885 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102818680 ns/op	  996511 B/op	    8903 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102812788 ns/op	  996255 B/op	    8915 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102721233 ns/op	 1001106 B/op	    8915 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102761136 ns/op	  996344 B/op	    8915 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102847599 ns/op	 1000271 B/op	    8916 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105694787 ns/op	 3308810 B/op	   28565 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105721852 ns/op	 3316808 B/op	   28568 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105581708 ns/op	 3312919 B/op	   28567 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105579866 ns/op	 3316852 B/op	   28569 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105737018 ns/op	 3312906 B/op	   28567 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105540314 ns/op	 3312791 B/op	   28567 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103725950 ns/op	 1116164 B/op	   11600 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103672210 ns/op	 1116138 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103809080 ns/op	 1126735 B/op	   11603 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103721991 ns/op	 1116260 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103740592 ns/op	 1112423 B/op	   11600 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103752755 ns/op	 1120231 B/op	   11603 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113986148 ns/op	 4465467 B/op	   40372 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113900167 ns/op	 4469680 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114331942 ns/op	 4469750 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113928733 ns/op	 4474249 B/op	   40375 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114854392 ns/op	 4469481 B/op	   40372 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114006018 ns/op	 4465318 B/op	   40371 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113909334 ns/op	 6122169 B/op	   74980 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114656936 ns/op	 6129346 B/op	   74966 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114619036 ns/op	 6129492 B/op	   74937 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114551621 ns/op	 6119232 B/op	   74953 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 108969310 ns/op	 2410982 B/op	   37082 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108854843 ns/op	 2405486 B/op	   37067 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108401558 ns/op	 2405832 B/op	   37066 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108125133 ns/op	 2406943 B/op	   37070 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108141260 ns/op	 2407504 B/op	   37068 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108169792 ns/op	 2406902 B/op	   37071 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138587504 ns/op	13027315 B/op	  328602 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 139322869 ns/op	13027748 B/op	  328592 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138604041 ns/op	13021675 B/op	  328595 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138547160 ns/op	13020931 B/op	  328595 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 139434572 ns/op	13021709 B/op	  328595 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137111313 ns/op	13020219 B/op	  328588 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 223593568 ns/op	45049667 B/op	 1402677 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215344024 ns/op	45049630 B/op	 1402682 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 229593519 ns/op	45049454 B/op	 1402682 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 225514952 ns/op	45080294 B/op	 1402690 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 242717021 ns/op	45063654 B/op	 1402675 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 220400062 ns/op	45058009 B/op	 1402688 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102946815 ns/op	  987200 B/op	    8882 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102882495 ns/op	  992528 B/op	    8903 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103008006 ns/op	 1000407 B/op	    8917 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102903873 ns/op	  992335 B/op	    8913 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102784265 ns/op	  996288 B/op	    8915 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102786684 ns/op	 1001215 B/op	    8915 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106138474 ns/op	 3316724 B/op	   28568 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105625917 ns/op	 3308948 B/op	   28565 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105941903 ns/op	 3309000 B/op	   28566 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105749915 ns/op	 3316736 B/op	   28568 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105691551 ns/op	 3308861 B/op	   28565 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105797030 ns/op	 3308968 B/op	   28565 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103722304 ns/op	 1116176 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103827371 ns/op	 1116172 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103626922 ns/op	 1120143 B/op	   11602 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103621274 ns/op	 1111951 B/op	   11598 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103650068 ns/op	 1122527 B/op	   11600 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103672688 ns/op	 1120012 B/op	   11602 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113853162 ns/op	 4469904 B/op	   40374 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113681669 ns/op	 4469828 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113969403 ns/op	 4469862 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114206682 ns/op	 4469890 B/op	   40375 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114240878 ns/op	 4474345 B/op	   40375 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114001378 ns/op	 4469892 B/op	   40373 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114231522 ns/op	 6123698 B/op	   74990 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114795828 ns/op	 6114624 B/op	   74929 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114580088 ns/op	 6113958 B/op	   74920 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114612216 ns/op	 6127542 B/op	   74941 allocs/op

AfterAll now deletes plugin keys entirely instead of setting enabled=false, letting the operator reconcile the correct defaults for the active strategy.

github-actions · 2026-02-27T23:38:54Z

Merging this branch will increase overall coverage

Impacted Packages	Coverage Δ	🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/allocate	94.87% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement	92.59% (+0.59%)	👍
github.com/NVIDIA/KAI-scheduler/test/e2e/modules/configurations/feature_flags	0.00% (ø)
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/node_order	0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File	Coverage Δ	Total	Covered	Missed	🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go	100.00% (ø)	17 (+6)	17 (+6)	0
github.com/NVIDIA/KAI-scheduler/test/e2e/modules/configurations/feature_flags/plugins.go	0.00% (ø)	0	0	0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/node_order/fill_node_test.go
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/node_order/node_order_suite_test.go

github-actions · 2026-02-28T20:47:49Z

Merging this branch will increase overall coverage

Impacted Packages	Coverage Δ	🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/allocate	94.87% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement	92.59% (+0.59%)	👍
github.com/NVIDIA/KAI-scheduler/test/e2e/modules/configurations/feature_flags	0.00% (ø)
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/node_order	0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File	Coverage Δ	Total	Covered	Missed	🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/spread.go	100.00% (ø)	17 (+6)	17 (+6)	0
github.com/NVIDIA/KAI-scheduler/test/e2e/modules/configurations/feature_flags/plugins.go	0.00% (ø)	0	0	0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/plugins/nodeplacement/nodespread_test.go
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/node_order/fill_node_test.go
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/node_order/node_order_suite_test.go

gshaibi · 2026-03-02T06:34:29Z

pkg/scheduler/plugins/nodeplacement/spread.go

+			idleGPUs, _ := node.GetSumOfIdleGPUs()
+			releasingGPUs, _ := node.GetSumOfReleasingGPUs()
+			nonAllocated = idleGPUs + releasingGPUs


Consider exporting that on the node

gshaibi · 2026-03-02T06:37:34Z

pkg/scheduler/plugins/nodeplacement/spread.go

 func nodeResourceSpread(resourceName v1.ResourceName) api.NodeOrderFn {
 	return func(task *pod_info.PodInfo, node *node_info.NodeInfo) (float64, error) {
 		var resourceCount float64
+		var nonAllocated float64


Should the same logic be applied to pack.go?

enoodle added 3 commits February 27, 2026 12:24

tests for gpupack without gpusharingorder issue

b1e7e2c

feat(scheduler): node gpu pack/spread with fractions

515918b

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

pkg/scheduler/actions/allocate/allocateFractionalGpu_test.go Show resolved Hide resolved

test/e2e/suites/allocate/node_order/fill_node_test.go Show resolved Hide resolved

test/e2e/suites/allocate/node_order/fill_node_test.go Show resolved Hide resolved

enoodle marked this pull request as draft February 27, 2026 22:36

enoodle added 2 commits February 28, 2026 00:02

style: fix import ordering in plugins.go (std/external/internal)

79a4476

fix: add UnsetPlugin to remove plugin keys from shard spec in cleanup

0b32557

AfterAll now deletes plugin keys entirely instead of setting enabled=false, letting the operator reconcile the correct defaults for the active strategy.

wait for pods ready

744eb10

gshaibi approved these changes Mar 2, 2026

View reviewed changes

Conversation

enoodle commented Feb 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example: Why the bug causes incorrect spreading

Adding gpusharingorder doesn't help — it makes things worse

Changes

Related Issues

Checklist

Breaking Changes

Additional Notes

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

enoodle commented Feb 27, 2026

Uh oh!

coderabbitai bot commented Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Performance Benchmark Results

Legend

Uh oh!

github-actions bot commented Feb 27, 2026

Merging this branch will increase overall coverage

Changed files (no unit tests)

Changed unit test files

Uh oh!

github-actions bot commented Feb 28, 2026

Merging this branch will increase overall coverage

Changed files (no unit tests)

Changed unit test files

Uh oh!

gshaibi Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

gshaibi Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enoodle commented Feb 27, 2026 •

edited by coderabbitai bot

Loading

Adding `gpusharingorder` doesn't help — it makes things worse

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

github-actions bot commented Feb 27, 2026 •

edited

Loading