Skip to content

fix: support Prometheus duration format for windowSize in SchedulingShard CR#1371

Merged
itsomri merged 2 commits intokai-scheduler:mainfrom
JheSue:fix/windowsize-duration-parsing
Apr 3, 2026
Merged

fix: support Prometheus duration format for windowSize in SchedulingShard CR#1371
itsomri merged 2 commits intokai-scheduler:mainfrom
JheSue:fix/windowsize-duration-parsing

Conversation

@JheSue
Copy link
Copy Markdown
Contributor

@JheSue JheSue commented Apr 1, 2026

Description

The windowSize field in SchedulingShard CR uses *metav1.Duration which internally calls Go's time.ParseDuration. This does not support Prometheus duration units like w (week) or d (day), causing both the kai-operator and kai-scheduler to crash when windowSize: 1w is set as documented:

failed to list *v1.SchedulingShard: time: unknown unit "w" in duration "1w"

Replace *metav1.Duration with *model.Duration from github.com/prometheus/common/model, which supports the full Prometheus duration format (ms, s, m, h, d, w, y) via built-in UnmarshalJSON/UnmarshalYAML.

Related Issues

Checklist

Note: Ensure your PR title follows the Conventional Commits format (e.g., feat(scheduler): add new feature)

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Breaking Changes

Additional Notes

Summary by CodeRabbit

Bug Fixes

  • Fixed windowSize field in the SchedulingShard custom resource to accept Prometheus duration format strings (e.g., 1w, 7d, 168h) instead of only Kubernetes duration format, resolving operator crash-loop errors when using unsupported duration units.

…ield

Replace *metav1.Duration with *model.Duration (github.com/prometheus/common/model)
for the UsageParams.WindowSize field.

metav1.Duration uses Go's time.ParseDuration internally which does not support
the 'w' (week) or 'd' (day) units. This caused the kai-operator to crash-loop
when windowSize was set to '1w' in the SchedulingShard CR:

  failed to list *v1.SchedulingShard: time: unknown unit \"w\" in duration \"1w\"

model.Duration supports the full Prometheus duration format (ms, s, m, h, d, w, y),
allowing users to set windowSize using human-friendly values like '1w' or '7d'
as documented.

Signed-off-by: Homer <homer.su@fongcon.com.tw>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fc09c67b-5861-497d-bcd8-1246a8b45b0e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR migrates the WindowSize field from Kubernetes' metav1.Duration to Prometheus' model.Duration across the scheduler codebase to support Prometheus duration string formats (e.g., 1w, 7d, 168h) instead of only ISO 8601 durations.

Changes

Cohort / File(s) Summary
Core API Type Migration
pkg/scheduler/cache/usagedb/api/interface.go
Changed UsageParams.WindowSize from *metav1.Duration to *model.Duration and added Prometheus model import.
Default Value Setup
pkg/scheduler/cache/usagedb/api/defaults.go
Updated SetDefaults() to instantiate default WindowSize using model.Duration instead of metav1.Duration.
API Tests
pkg/scheduler/cache/usagedb/api/usage_params_test.go
Updated test fixtures and assertions to use model.Duration, added JSON unmarshalling tests for Prometheus duration formats (1w, 7d, 168h, etc.).
Implementation Updates
pkg/scheduler/cache/usagedb/prometheus/prometheus.go, pkg/scheduler/cache/usagedb/fake/fake_with_history.go
Modified window duration calculations to dereference *model.Duration as time.Duration instead of accessing .Duration field.
Test Fixtures
pkg/operator/operands/scheduler/resources_test.go, pkg/scheduler/cache/usagedb/prometheus/prometheus_test.go, test/e2e/suites/timeaware/config.go, pkg/env-tests/timeaware/timeaware.go
Updated test setup to construct WindowSize using ptr.To(model.Duration(...)) and adjusted assertions for new type.
Documentation
CHANGELOG.md
Added entry documenting fix for windowSize accepting Prometheus duration strings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Prometheus durations hop into place,
No more metav1's time-telling grace,
Seven days now bloom as 7d so sweet,
Duration strings make scheduling complete!
🕐✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: replacing metav1.Duration with model.Duration to support Prometheus duration formats like '1w' and '7d' for the windowSize field.
Description check ✅ Passed The description provides a clear problem statement, solution explanation, and indicates tests were added. However, the Related Issues section is empty and documentation update checkbox is unchecked, though these are not critical.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@enoodle
Copy link
Copy Markdown
Collaborator

enoodle commented Apr 1, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (7)
test/e2e/suites/timeaware/config.go (1)

6-19: Import organization does not follow coding guidelines.

The github.com/prometheus/common/model import (line 18) is placed after internal packages. It should be grouped with other external dependencies.

♻️ Suggested import reordering
 import (
 	"context"
 	"time"

 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/utils/ptr"
+	"github.com/prometheus/common/model"

 	kaiv1 "github.com/kai-scheduler/KAI-scheduler/pkg/apis/kai/v1"
 	kaiprometheus "github.com/kai-scheduler/KAI-scheduler/pkg/apis/kai/v1/prometheus"
 	usagedbapi "github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
 	"github.com/kai-scheduler/KAI-scheduler/test/e2e/modules/configurations"
 	testcontext "github.com/kai-scheduler/KAI-scheduler/test/e2e/modules/context"
-	"github.com/prometheus/common/model"
 )

As per coding guidelines: "Organize imports in three groups separated by blank lines: standard library, external dependencies, and internal packages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/suites/timeaware/config.go` around lines 6 - 19, The import ordering
is wrong: move the external package import "github.com/prometheus/common/model"
into the external dependencies group (with other non-standard imports) so
imports are grouped as standard library, external dependencies, then internal
packages; update the import block in test/e2e/suites/timeaware/config.go (the
import section containing metav1, ptr, kaiv1, kaiprometheus, usagedbapi,
configurations, testcontext) to place the prometheus model import alongside
other external imports, separating groups with blank lines to match the
project's import organization guidelines.
pkg/scheduler/cache/usagedb/fake/fake_with_history.go (1)

6-16: Import organization does not follow coding guidelines.

Internal packages (github.com/kai-scheduler/KAI-scheduler/...) should be separated from external dependencies (k8s.io/...) with a blank line.

♻️ Suggested import reordering
 import (
 	"math"
 	"sync"
 	"time"

-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
 	v1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info"
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info"
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
 )

As per coding guidelines: "Organize imports in three groups separated by blank lines: standard library, external dependencies, and internal packages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/cache/usagedb/fake/fake_with_history.go` around lines 6 - 16,
The import block in fake_with_history.go is not grouped per project style: move
standard library imports (math, sync, time) first, then a blank line, then
external dependencies (v1 "k8s.io/api/core/v1", metav1
"k8s.io/apimachinery/pkg/apis/meta/v1"), then a blank line, and finally internal
packages (github.com/kai-scheduler/... common_info, queue_info, api); adjust the
import order around the existing symbols (common_info, queue_info, api, v1,
metav1) to enforce the three-group separation.
pkg/scheduler/cache/usagedb/api/interface.go (1)

6-10: Import organization does not follow coding guidelines.

Imports should be organized in three groups: standard library, external dependencies, then internal packages. Currently, the internal package is listed first.

♻️ Suggested import reordering
 import (
+	"github.com/prometheus/common/model"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+
 	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info"
-	"github.com/prometheus/common/model"
-	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 )

As per coding guidelines: "Organize imports in three groups separated by blank lines: standard library, external dependencies, and internal packages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/cache/usagedb/api/interface.go` around lines 6 - 10, The import
block in interface.go is misordered: move internal package
"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info" into the
internal group and place external dependencies
(github.com/prometheus/common/model and k8s.io/apimachinery/pkg/apis/meta/v1) in
the external group, separating groups with a blank line; update the import
section around the existing import declaration so it follows the three-group
convention (standard, external, internal) and run gofmt/goimports to ensure
proper formatting.
pkg/scheduler/cache/usagedb/prometheus/prometheus.go (1)

6-22: Import organization does not follow coding guidelines.

Internal packages are mixed with external dependencies. They should be in separate groups.

♻️ Suggested import reordering
 import (
 	"context"
 	"fmt"
 	"time"

 	"github.com/aptible/supercronic/cronexpr"
-	commonconstants "github.com/kai-scheduler/KAI-scheduler/pkg/common/constants"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/log"
 	promapi "github.com/prometheus/client_golang/api"
 	promv1 "github.com/prometheus/client_golang/api/prometheus/v1"
 	"github.com/prometheus/common/model"
 	v1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+
+	commonconstants "github.com/kai-scheduler/KAI-scheduler/pkg/common/constants"
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/common_info"
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info"
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/log"
 )

As per coding guidelines: "Organize imports in three groups separated by blank lines: standard library, external dependencies, and internal packages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/cache/usagedb/prometheus/prometheus.go` around lines 6 - 22,
The import block in prometheus.go mixes standard library, external, and internal
packages; reorder the imports in three groups separated by blank lines: first
standard library (context, fmt, time), then external third-party packages
(github.com/aptible/supercronic/cronexpr, promapi, promv1, model, v1, metav1),
and finally internal project packages (commonconstants, common_info, queue_info,
api, log) so the import grouping follows the repository coding guidelines;
update the import section in
pkg/scheduler/cache/usagedb/prometheus/prometheus.go accordingly.
pkg/scheduler/cache/usagedb/prometheus/prometheus_test.go (1)

6-17: Import organization does not follow coding guidelines.

The internal package (github.com/kai-scheduler/KAI-scheduler/...) should be in a separate group after external dependencies.

♻️ Suggested import reordering
 import (
 	"testing"
 	"time"

 	"github.com/aptible/supercronic/cronexpr"
-	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
 	"github.com/prometheus/common/model"
 	"github.com/stretchr/testify/assert"
 	"github.com/stretchr/testify/require"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/utils/ptr"
+
+	"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api"
 )

As per coding guidelines: "Organize imports in three groups separated by blank lines: standard library, external dependencies, and internal packages".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/cache/usagedb/prometheus/prometheus_test.go` around lines 6 -
17, The import block in prometheus_test.go is not grouped into standard library,
external dependencies, and internal packages; reorder the imports so the first
group contains standard packages (testing, time), the second group contains
external modules (github.com/aptible/supercronic/cronexpr,
github.com/prometheus/common/model, github.com/stretchr/testify/assert,
github.com/stretchr/testify/require, metav1
"k8s.io/apimachinery/pkg/apis/meta/v1", "k8s.io/utils/ptr"), and the final group
contains the internal package
(github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api),
separating each group with a single blank line to comply with the project's
import organization guideline.
pkg/operator/operands/scheduler/resources_test.go (1)

1166-1166: Prefer a non-nil assertion before pointer dereference in validation callback.

Add require.NotNil(t, result.UsageParams.WindowSize) before dereferencing to avoid panic-style failures.

Suggested diff
 			validate: func(t *testing.T, result *usagedbapi.UsageDBConfig) {
 				assert.NotNil(t, result)
 				assert.NotNil(t, result.UsageParams)
 				assert.Equal(t, 10*time.Minute, result.UsageParams.HalfLifePeriod.Duration)
+				require.NotNil(t, result.UsageParams.WindowSize)
 				assert.Equal(t, 20*time.Minute, time.Duration(*result.UsageParams.WindowSize))
 				assert.Equal(t, "http://prometheus:9090", result.ConnectionString)
 			},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/operator/operands/scheduler/resources_test.go` at line 1166, The test
dereferences result.UsageParams.WindowSize directly which can panic if nil; add
a non-nil assertion before dereferencing by calling require.NotNil(t,
result.UsageParams.WindowSize) (or the project's equivalent) immediately before
the assert.Equal line that reads time.Duration(*result.UsageParams.WindowSize),
so the test fails cleanly if WindowSize is nil; update the validation callback
in resources_test.go where result and UsageParams.WindowSize are checked.
pkg/scheduler/cache/usagedb/api/usage_params_test.go (1)

381-382: Add explicit nil guard for UsageParams before dereference.

This test can fail with a panic instead of a clear assertion if usageParams is unexpectedly absent; add require.NotNil(t, config.UsageParams) before accessing WindowSize.

Suggested diff
 			require.NoError(t, err)
+			require.NotNil(t, config.UsageParams)
 			require.NotNil(t, config.UsageParams.WindowSize)
 			assert.Equal(t, tt.expected, *config.UsageParams.WindowSize)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/scheduler/cache/usagedb/api/usage_params_test.go` around lines 381 - 382,
Add an explicit nil guard for the UsageParams struct in the test before
dereferencing WindowSize: in the test in usage_params_test.go assert that
config.UsageParams is not nil (use require.NotNil(t, config.UsageParams))
immediately before the existing require.NotNil(t, config.UsageParams.WindowSize)
/ assert.Equal(t, tt.expected, *config.UsageParams.WindowSize) lines so the test
fails with a clear assertion rather than panicking when UsageParams is absent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/operator/operands/scheduler/resources_test.go`:
- Line 1166: The test dereferences result.UsageParams.WindowSize directly which
can panic if nil; add a non-nil assertion before dereferencing by calling
require.NotNil(t, result.UsageParams.WindowSize) (or the project's equivalent)
immediately before the assert.Equal line that reads
time.Duration(*result.UsageParams.WindowSize), so the test fails cleanly if
WindowSize is nil; update the validation callback in resources_test.go where
result and UsageParams.WindowSize are checked.

In `@pkg/scheduler/cache/usagedb/api/interface.go`:
- Around line 6-10: The import block in interface.go is misordered: move
internal package
"github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/api/queue_info" into the
internal group and place external dependencies
(github.com/prometheus/common/model and k8s.io/apimachinery/pkg/apis/meta/v1) in
the external group, separating groups with a blank line; update the import
section around the existing import declaration so it follows the three-group
convention (standard, external, internal) and run gofmt/goimports to ensure
proper formatting.

In `@pkg/scheduler/cache/usagedb/api/usage_params_test.go`:
- Around line 381-382: Add an explicit nil guard for the UsageParams struct in
the test before dereferencing WindowSize: in the test in usage_params_test.go
assert that config.UsageParams is not nil (use require.NotNil(t,
config.UsageParams)) immediately before the existing require.NotNil(t,
config.UsageParams.WindowSize) / assert.Equal(t, tt.expected,
*config.UsageParams.WindowSize) lines so the test fails with a clear assertion
rather than panicking when UsageParams is absent.

In `@pkg/scheduler/cache/usagedb/fake/fake_with_history.go`:
- Around line 6-16: The import block in fake_with_history.go is not grouped per
project style: move standard library imports (math, sync, time) first, then a
blank line, then external dependencies (v1 "k8s.io/api/core/v1", metav1
"k8s.io/apimachinery/pkg/apis/meta/v1"), then a blank line, and finally internal
packages (github.com/kai-scheduler/... common_info, queue_info, api); adjust the
import order around the existing symbols (common_info, queue_info, api, v1,
metav1) to enforce the three-group separation.

In `@pkg/scheduler/cache/usagedb/prometheus/prometheus_test.go`:
- Around line 6-17: The import block in prometheus_test.go is not grouped into
standard library, external dependencies, and internal packages; reorder the
imports so the first group contains standard packages (testing, time), the
second group contains external modules (github.com/aptible/supercronic/cronexpr,
github.com/prometheus/common/model, github.com/stretchr/testify/assert,
github.com/stretchr/testify/require, metav1
"k8s.io/apimachinery/pkg/apis/meta/v1", "k8s.io/utils/ptr"), and the final group
contains the internal package
(github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api),
separating each group with a single blank line to comply with the project's
import organization guideline.

In `@pkg/scheduler/cache/usagedb/prometheus/prometheus.go`:
- Around line 6-22: The import block in prometheus.go mixes standard library,
external, and internal packages; reorder the imports in three groups separated
by blank lines: first standard library (context, fmt, time), then external
third-party packages (github.com/aptible/supercronic/cronexpr, promapi, promv1,
model, v1, metav1), and finally internal project packages (commonconstants,
common_info, queue_info, api, log) so the import grouping follows the repository
coding guidelines; update the import section in
pkg/scheduler/cache/usagedb/prometheus/prometheus.go accordingly.

In `@test/e2e/suites/timeaware/config.go`:
- Around line 6-19: The import ordering is wrong: move the external package
import "github.com/prometheus/common/model" into the external dependencies group
(with other non-standard imports) so imports are grouped as standard library,
external dependencies, then internal packages; update the import block in
test/e2e/suites/timeaware/config.go (the import section containing metav1, ptr,
kaiv1, kaiprometheus, usagedbapi, configurations, testcontext) to place the
prometheus model import alongside other external imports, separating groups with
blank lines to match the project's import organization guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9d2d0a55-f582-49b6-9581-ea5926777be5

📥 Commits

Reviewing files that changed from the base of the PR and between 6d098c5 and 75a7229.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • pkg/env-tests/timeaware/timeaware.go
  • pkg/operator/operands/scheduler/resources_test.go
  • pkg/scheduler/cache/usagedb/api/defaults.go
  • pkg/scheduler/cache/usagedb/api/interface.go
  • pkg/scheduler/cache/usagedb/api/usage_params_test.go
  • pkg/scheduler/cache/usagedb/fake/fake_with_history.go
  • pkg/scheduler/cache/usagedb/prometheus/prometheus.go
  • pkg/scheduler/cache/usagedb/prometheus/prometheus_test.go
  • test/e2e/suites/timeaware/config.go

@enoodle enoodle requested a review from itsomri April 1, 2026 12:04
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

📊 Performance Benchmark Results

Comparing PR (fix/windowsize-duration-parsing) vs main branch — click to expand
goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
                                    │ main-bench.txt │           pr-bench.txt            │
                                    │     sec/op     │   sec/op     vs base              │
AllocateAction_SmallCluster-4            107.5m ± 0%   107.8m ± 0%       ~ (p=0.132 n=6)
AllocateAction_MediumCluster-4           134.8m ± 1%   134.7m ± 1%       ~ (p=0.937 n=6)
AllocateAction_LargeCluster-4            207.6m ± 6%   209.6m ± 7%       ~ (p=0.485 n=6)
ReclaimAction_SmallCluster-4             102.6m ± 0%   102.7m ± 0%       ~ (p=0.310 n=6)
ReclaimAction_MediumCluster-4            105.5m ± 0%   105.4m ± 0%       ~ (p=0.818 n=6)
PreemptAction_SmallCluster-4             103.4m ± 0%   103.4m ± 0%       ~ (p=0.589 n=6)
PreemptAction_MediumCluster-4            112.7m ± 0%   113.0m ± 0%  +0.24% (p=0.041 n=6)
ConsolidationAction_SmallCluster-4       113.7m ± 0%   113.7m ± 1%       ~ (p=1.000 n=6)
ConsolidationAction_MediumCluster-4      201.2m ± 4%   200.9m ± 1%       ~ (p=0.937 n=6)
FullSchedulingCycle_SmallCluster-4       105.0m ± 0%   105.0m ± 0%       ~ (p=0.240 n=6)
FullSchedulingCycle_MediumCluster-4      120.9m ± 1%   120.9m ± 1%       ~ (p=0.937 n=6)
FullSchedulingCycle_LargeCluster-4       162.8m ± 3%   162.3m ± 2%       ~ (p=0.937 n=6)
ManyQueues_MediumCluster-4               138.1m ± 2%   138.4m ± 1%       ~ (p=1.000 n=6)
GangScheduling_MediumCluster-4           155.2m ± 1%   158.3m ± 1%  +2.02% (p=0.009 n=6)
geomean                                  129.8m        130.1m       +0.23%

                                    │ main-bench.txt │            pr-bench.txt            │
                                    │      B/op      │     B/op      vs base              │
AllocateAction_SmallCluster-4           2.259Mi ± 0%   2.255Mi ± 0%       ~ (p=0.485 n=6)
AllocateAction_MediumCluster-4          12.20Mi ± 0%   12.20Mi ± 0%       ~ (p=0.394 n=6)
AllocateAction_LargeCluster-4           42.05Mi ± 0%   42.05Mi ± 0%       ~ (p=0.818 n=6)
ReclaimAction_SmallCluster-4            960.3Ki ± 1%   961.9Ki ± 1%       ~ (p=1.000 n=6)
ReclaimAction_MediumCluster-4           3.173Mi ± 0%   3.173Mi ± 0%       ~ (p=0.485 n=6)
PreemptAction_SmallCluster-4            1.065Mi ± 0%   1.063Mi ± 0%       ~ (p=0.394 n=6)
PreemptAction_MediumCluster-4           4.320Mi ± 0%   4.322Mi ± 0%       ~ (p=0.310 n=6)
ConsolidationAction_SmallCluster-4      5.557Mi ± 0%   5.558Mi ± 0%       ~ (p=0.699 n=6)
ConsolidationAction_MediumCluster-4     46.74Mi ± 0%   46.73Mi ± 0%       ~ (p=0.310 n=6)
FullSchedulingCycle_SmallCluster-4      1.463Mi ± 1%   1.457Mi ± 1%       ~ (p=0.699 n=6)
FullSchedulingCycle_MediumCluster-4     7.186Mi ± 0%   7.185Mi ± 0%       ~ (p=0.485 n=6)
FullSchedulingCycle_LargeCluster-4      23.51Mi ± 0%   23.51Mi ± 0%       ~ (p=0.180 n=6)
ManyQueues_MediumCluster-4              16.66Mi ± 0%   16.66Mi ± 0%       ~ (p=0.937 n=6)
GangScheduling_MediumCluster-4          17.78Mi ± 0%   17.78Mi ± 0%       ~ (p=0.180 n=6)
geomean                                 6.606Mi        6.604Mi       -0.04%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            36.31k ± 0%   36.32k ± 0%       ~ (p=0.844 n=6)
AllocateAction_MediumCluster-4           317.9k ± 0%   317.9k ± 0%       ~ (p=0.784 n=6)
AllocateAction_LargeCluster-4            1.351M ± 0%   1.351M ± 0%       ~ (p=0.297 n=6)
ReclaimAction_SmallCluster-4             8.960k ± 0%   8.960k ± 0%       ~ (p=0.996 n=6)
ReclaimAction_MediumCluster-4            29.06k ± 0%   29.06k ± 0%       ~ (p=0.470 n=6)
PreemptAction_SmallCluster-4             11.68k ± 0%   11.68k ± 0%       ~ (p=0.396 n=6)
PreemptAction_MediumCluster-4            40.99k ± 0%   41.00k ± 0%       ~ (p=0.240 n=6)
ConsolidationAction_SmallCluster-4       71.62k ± 0%   71.63k ± 0%       ~ (p=0.485 n=6)
ConsolidationAction_MediumCluster-4      675.8k ± 0%   675.8k ± 0%       ~ (p=0.937 n=6)
FullSchedulingCycle_SmallCluster-4       21.70k ± 0%   21.70k ± 0%       ~ (p=0.558 n=6)
FullSchedulingCycle_MediumCluster-4      172.3k ± 0%   172.3k ± 0%       ~ (p=0.764 n=6)
FullSchedulingCycle_LargeCluster-4       708.8k ± 0%   708.8k ± 0%       ~ (p=0.472 n=6)
ManyQueues_MediumCluster-4               355.8k ± 0%   355.8k ± 0%       ~ (p=0.981 n=6)
GangScheduling_MediumCluster-4           581.7k ± 0%   581.7k ± 0%       ~ (p=0.576 n=6)
geomean                                  112.4k        112.4k       +0.00%

pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions/integration_tests/reclaim
                            │ main-bench.txt │           pr-bench.txt            │
                            │     sec/op     │   sec/op     vs base              │
ReclaimLargeJobs_10Node-4        104.4m ± 0%   104.5m ± 0%       ~ (p=1.000 n=6)
ReclaimLargeJobs_50Node-4        139.7m ± 1%   139.3m ± 1%       ~ (p=0.699 n=6)
ReclaimLargeJobs_100Node-4       271.1m ± 0%   270.9m ± 1%       ~ (p=0.240 n=6)
ReclaimLargeJobs_200Node-4        1.065 ± 6%    1.056 ± 0%       ~ (p=0.132 n=6)
ReclaimLargeJobs_500Node-4        12.68 ± 1%    12.66 ± 1%       ~ (p=0.699 n=6)
ReclaimLargeJobs_1000Node-4       109.7 ± 0%    109.7 ± 0%       ~ (p=0.818 n=6)
geomean                           1.343         1.339       -0.24%

                            │ main-bench.txt │            pr-bench.txt            │
                            │      B/op      │     B/op      vs base              │
ReclaimLargeJobs_10Node-4       1.875Mi ± 2%   1.877Mi ± 2%       ~ (p=0.818 n=6)
ReclaimLargeJobs_50Node-4       17.62Mi ± 0%   17.63Mi ± 0%       ~ (p=0.937 n=6)
ReclaimLargeJobs_100Node-4      59.71Mi ± 0%   59.71Mi ± 0%       ~ (p=0.699 n=6)
ReclaimLargeJobs_200Node-4      233.9Mi ± 0%   233.8Mi ± 0%       ~ (p=0.093 n=6)
ReclaimLargeJobs_500Node-4      1.658Gi ± 0%   1.658Gi ± 0%       ~ (p=0.699 n=6)
ReclaimLargeJobs_1000Node-4     8.526Gi ± 0%   8.526Gi ± 0%       ~ (p=1.000 n=6)
geomean                         137.8Mi        137.8Mi       +0.01%

                            │ main-bench.txt │           pr-bench.txt            │
                            │   allocs/op    │  allocs/op   vs base              │
ReclaimLargeJobs_10Node-4        20.34k ± 2%   20.34k ± 2%       ~ (p=0.364 n=6)
ReclaimLargeJobs_50Node-4        234.3k ± 0%   234.3k ± 0%       ~ (p=0.208 n=6)
ReclaimLargeJobs_100Node-4       872.5k ± 0%   872.5k ± 0%       ~ (p=0.974 n=6)
ReclaimLargeJobs_200Node-4       3.690M ± 0%   3.690M ± 0%       ~ (p=0.900 n=6)
ReclaimLargeJobs_500Node-4       29.66M ± 0%   29.66M ± 0%       ~ (p=0.310 n=6)
ReclaimLargeJobs_1000Node-4      165.8M ± 0%   165.8M ± 0%       ~ (p=0.738 n=6)
geomean                          2.056M        2.056M       +0.00%

Legend

  • 📉 Negative delta = Performance improvement (faster)
  • 📈 Positive delta = Performance regression (slower)
  • p-value < 0.05 indicates statistically significant change
Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 107495356 ns/op	 2365240 B/op	   36320 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107428133 ns/op	 2364933 B/op	   36318 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107881954 ns/op	 2361916 B/op	   36311 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107893825 ns/op	 2362544 B/op	   36312 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107693134 ns/op	 2365316 B/op	   36316 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108193924 ns/op	 2364183 B/op	   36316 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134121483 ns/op	12792318 B/op	  317911 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134180220 ns/op	12792746 B/op	  317899 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135092026 ns/op	12790773 B/op	  317902 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134405205 ns/op	12792007 B/op	  317907 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134982566 ns/op	12790165 B/op	  317897 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135813817 ns/op	12790963 B/op	  317902 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 209143961 ns/op	44101254 B/op	 1351091 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 206961309 ns/op	44095131 B/op	 1351087 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 223725912 ns/op	44092868 B/op	 1351072 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 215714187 ns/op	44108139 B/op	 1351070 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 210100715 ns/op	44094904 B/op	 1351083 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 207365745 ns/op	44099244 B/op	 1351072 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102756802 ns/op	  972268 B/op	    8927 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102718885 ns/op	  977794 B/op	    8953 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102686062 ns/op	  984741 B/op	    8960 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102645854 ns/op	  985692 B/op	    8963 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102661520 ns/op	  985168 B/op	    8961 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102675651 ns/op	  985365 B/op	    8961 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105928890 ns/op	 3323844 B/op	   29063 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105472964 ns/op	 3332370 B/op	   29064 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105404890 ns/op	 3327453 B/op	   29064 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105397557 ns/op	 3323656 B/op	   29062 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105402796 ns/op	 3327632 B/op	   29064 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105299138 ns/op	 3327287 B/op	   29064 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103333385 ns/op	 1117036 B/op	   11678 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103337514 ns/op	 1116856 B/op	   11677 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103502366 ns/op	 1112975 B/op	   11676 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103406796 ns/op	 1112666 B/op	   11674 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103458029 ns/op	 1116572 B/op	   11677 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103408464 ns/op	 1112653 B/op	   11674 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113513239 ns/op	 4532144 B/op	   40997 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113074900 ns/op	 4531880 B/op	   40995 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112968265 ns/op	 4538885 B/op	   40994 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112962104 ns/op	 4531790 B/op	   40996 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112965651 ns/op	 4527368 B/op	   40994 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112992412 ns/op	 4532115 B/op	   40997 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113291616 ns/op	 5828527 B/op	   71632 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114119014 ns/op	 5828626 B/op	   71642 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114301005 ns/op	 5828377 B/op	   71626 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113238359 ns/op	 5826361 B/op	   71628 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 107301697 ns/op	 2372171 B/op	   36319 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107272128 ns/op	 2376919 B/op	   36323 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107427700 ns/op	 2376680 B/op	   36313 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107841360 ns/op	 2362346 B/op	   36313 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107639163 ns/op	 2364778 B/op	   36317 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 107710535 ns/op	 2362164 B/op	   36311 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134698281 ns/op	12792932 B/op	  317914 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 133663259 ns/op	12792550 B/op	  317896 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134500874 ns/op	12790903 B/op	  317902 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 134859441 ns/op	12790871 B/op	  317903 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135256643 ns/op	12800623 B/op	  317906 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 135617460 ns/op	12791021 B/op	  317902 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 208068678 ns/op	44100297 B/op	 1351087 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 207228756 ns/op	44100110 B/op	 1351071 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 212342266 ns/op	44093560 B/op	 1351070 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 220729786 ns/op	44095310 B/op	 1351082 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 206788014 ns/op	44093523 B/op	 1351071 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 207164842 ns/op	44099238 B/op	 1351071 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102574598 ns/op	  972453 B/op	    8928 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102595031 ns/op	  980706 B/op	    8950 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102702386 ns/op	  985288 B/op	    8961 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102616801 ns/op	  985356 B/op	    8962 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102677453 ns/op	  985452 B/op	    8961 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 102690499 ns/op	  981381 B/op	    8960 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105727394 ns/op	 3327481 B/op	   29064 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105472331 ns/op	 3328301 B/op	   29062 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105455556 ns/op	 3323607 B/op	   29063 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105482777 ns/op	 3327256 B/op	   29064 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105396626 ns/op	 3323469 B/op	   29062 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 105308284 ns/op	 3327224 B/op	   29063 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103339876 ns/op	 1116815 B/op	   11677 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103361133 ns/op	 1116893 B/op	   11677 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103451078 ns/op	 1116844 B/op	   11677 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103325314 ns/op	 1112872 B/op	   11676 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103452853 ns/op	 1117002 B/op	   11678 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103389230 ns/op	 1116712 B/op	   11677 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113256309 ns/op	 4527839 B/op	   40995 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112447362 ns/op	 4527452 B/op	   40993 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112744500 ns/op	 4534757 B/op	   40993 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112668665 ns/op	 4532064 B/op	   40997 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112672732 ns/op	 4531754 B/op	   40995 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 112834891 ns/op	 4527116 B/op	   40992 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114099541 ns/op	 5829842 B/op	   71641 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113833786 ns/op	 5824648 B/op	   71617 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113890730 ns/op	 5824976 B/op	   71613 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 113278418 ns/op	 5823087 B/op	   71599 allocs/op

@enoodle
Copy link
Copy Markdown
Collaborator

enoodle commented Apr 1, 2026

@JheSue You have to run make generate manifests to re-generate some files after updating the API

Replace *model.Duration with *monitoringv1.Duration for the UsageParams.WindowSize field.

model.Duration is backed by int64, which caused the CRD schema to be generated as an integer instead of string.

monitoringv1.Duration is a string-based type with kubebuilder validation that ensures the CRD schema is generated as string and enforces valid Prometheus duration format (ms, s, m, h, d, w, y) at the API server level.

Signed-off-by: Homer <homer.su@fongcon.com.tw>
@JheSue
Copy link
Copy Markdown
Contributor Author

JheSue commented Apr 2, 2026

@enoodle Thanks! After running make generate manifests, I noticed that the CRD schema for windowSize was generated as an integer due to model.Duration.

This breaks support for Prometheus duration strings, so I switched it to monitoringv1.Duration to ensure it’s handled as a validated string.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/env-tests/timeaware 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/operator/operands/scheduler 83.33% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api 31.51% (+0.95%) 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/fake 0.00% (ø)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/prometheus 30.25% (+0.69%) 👍
github.com/kai-scheduler/KAI-scheduler/test/e2e/suites/timeaware 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/kai-scheduler/KAI-scheduler/pkg/env-tests/timeaware/timeaware.go 0.00% (ø) 150 0 150
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api/defaults.go 54.55% (+1.42%) 33 (+1) 18 (+1) 15 👍
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api/interface.go 12.50% (ø) 40 5 35
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/fake/fake_with_history.go 0.00% (ø) 54 (+3) 0 54 (+3)
github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/prometheus/prometheus.go 30.25% (+0.69%) 119 (+4) 36 (+2) 83 (+2) 👍
github.com/kai-scheduler/KAI-scheduler/test/e2e/suites/timeaware/config.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/kai-scheduler/KAI-scheduler/pkg/operator/operands/scheduler/resources_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/api/usage_params_test.go
  • github.com/kai-scheduler/KAI-scheduler/pkg/scheduler/cache/usagedb/prometheus/prometheus_test.go

@itsomri
Copy link
Copy Markdown
Collaborator

itsomri commented Apr 3, 2026

This is great! Thanks for the contribution!

@itsomri itsomri added this pull request to the merge queue Apr 3, 2026
Merged via the queue into kai-scheduler:main with commit 778b4c2 Apr 3, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants