Releases: kai-scheduler/KAI-Scheduler
Releases · kai-scheduler/KAI-Scheduler
v0.14.0
What's Changed
Added
- Added queue validation webhook to queuecontroller with optional quota validation for parent-child relationships AdheipSingh
- Added support for VPA configuration for the different components of the KAI Scheduler - jrosenboimnvidia
- Users that have VPA installed on their cluster can now utilize it for proper vertical autoscaling
- Added FOSSA scanning for the repository context. Scans will also be performed for submitted PRs. The results can be found here. #1178 - davidLif
- Added support for Ray subgroup topology-aware scheduling by specifying
kai.scheduler/topology,kai.scheduler/topology-required-placement, andkai.scheduler/topology-preferred-placementannotations. - Allow subgroups to have a 0 value for "minAvailable". This means that all pods in this subgroup are "elastic extra pods". #1216 davidLif
Changed
- Auto-enable leader election when
operator.replicaCount> 1 to prevent concurrent reconciliation #1218 - Update go version to v1.26.1, With appropriate upgrades to the base docker images, linter, and controller generator. #1222 - davidLif
Fixed
- Updated resource enumeration logic to exclude resources with count of 0. #1120
- Fixed scheduler on k8s < 1.34 with DRA disabled.
- Fixed pod group controller failing to track DRA GPU resources on Kubernetes 1.32-1.33 clusters. #1214
- Fixed scheduling-constraints signature hashing for
Priorityand containerHostPortby encoding fullint32values, preventing byte-truncation collisions and flaky signature tests. - Fixed rollback in scheduling simulations with DRA #1168 itsomri
- Fixed a potential state corruption in DRA scheduling simulations #1219 itsomri
- Fixed operator reconcile loop caused by status-only updates triggering re-reconciliation. #1229 cypres
- Fixed scheduler not starting on k8s clusters with DRA disabled, due to the ResourceSliceTracker not syncing. #1241 cypres
- Fixed webhook reconcile loop on AKS, by retaining the cloud-provider-injected namespaceSelector rules during reconciliation. #1292 cypres
New Contributors
- @rich7420 made their first contribution in #816
- @Ronkahn21 made their first contribution in #821
- @faizan-exe made their first contribution in #913
- @lalitadithya made their first contribution in #954
- @steved made their first contribution in #972
- @yuanchen8911 made their first contribution in #1035
- @Hagay-RunAI made their first contribution in #1115
- @dougnd made their first contribution in #1123
- @rueian made their first contribution in #1125
- @JRosenboimNVIDIA made their first contribution in #1119
- @itayvallach made their first contribution in #1176
- @david-gang made their first contribution in #1223
- @cypres made their first contribution in #1241
- @AdheipSingh made their first contribution in #857
Full Changelog: v0.13.4...v0.14.0
v0.6.18
What's Changed
Fixed
- podGroup status update loop on conflict by @SiorMeir in #1313
- bind plugin server to localhost by @gshaibi in #997
- Do not include resources with a count of 0. by @KaiPilotBot in https://github.com/kai-scheduler/KAI-
Changed
- build: upgrade Go to 1.25.6, golangci-lint to v2.11.3, controller-gen to v0.20.1, mockgen to v0.6.0 - v0.6 by @davidLif in #1281
- ci: add approval gatekeeper workflow for external contributor PRs by @KaiPilotBot in #1004
Added
- Add dco github action by @KaiPilotBot in #1268
Full Changelog: v0.6.17...v0.6.18
v0.13.4
v0.13.3
v0.13.2
Fixed
- Fixed rollback in scheduling simulations with DRA #1168 itsomri
- Allow subgroups to have a 0 value for "minAvailable". This means that all pods in this subgroup are "elastic extra pods". #1216 davidLif
- Fixed pod group controller failing to track DRA GPU resources on Kubernetes 1.32-1.33 clusters. #1214
- Fixed a potential state corruption in DRA scheduling simulations #1225 itsomri
Full Changelog: v0.13.1...v0.13.2
v0.12.18
What's Changed
- ci: migrate container registry to ghcr.io/kai-scheduler/kai-scheduler by @SiorMeir in #1175
- fix(deps): upgrade opentelemetry SDK to v1.40.0 (GHSA-9h8m-3fm2-qjrq) by @enoodle in #1181
Full Changelog: v0.12.17...v0.12.18
v0.13.1
v0.9.15
v0.12.17
v0.9.14
What's Changed
- refactor: Represent podreferences as strings v0.9 by @itsomri in #985
- fix(scheduler): bind plugin server to localhost by @gshaibi in #996
- ci: add approval gatekeeper workflow for external contributor PRs by @KaiPilotBot in #1003
- fix(queue-controller): use Spec.Queue field indexer for resource aggregation (#1049) by @gshaibi in #1053
- chore: auto-resolve CHANGELOG.md merge conflicts with union strategy by @KaiPilotBot in #1054
- fix: skip runtimeClassName injection when gpuPodRuntimeClassName is e… by @enoodle in #1131
Full Changelog: v0.9.13...v0.9.14