fix(scheduler): Do not include resources with a count of 0.#1123
fix(scheduler): Do not include resources with a count of 0.#1123enoodle merged 4 commits intoNVIDIA:mainfrom
Conversation
Kubelet does not remove resources, even when they are no longer advertised from a device plugin (see: kubernetes/kubernetes#92396). This change prevents removed devices from impacting KAI's scheduling logic. Fixes NVIDIA#1120
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
Merging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
enoodle
left a comment
There was a problem hiding this comment.
Can you add a unit test for this please?
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
|
Thanks for the review! Let me know if there is anything else I should change. I noticed the Performance benchmarks seem to fail on my PR. In the benckmark workflow (.github/workflows/benchmark.yaml), it looks like it:
This fails with Perhaps this line could be changed to: I believe this would then run the following instead: Or is this failure is expected for PRs from forks? |
|
Thanks, We should fix this action on another PR. |
(cherry picked from commit 78131cf)
|
Successfully created backport PR for |
(cherry picked from commit 78131cf)
|
Successfully created backport PR for |
|
Successfully created backport PR for |
(cherry picked from commit 78131cf)
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin v0.10
git worktree add -d .worktree/backport-1123-to-v0.10 origin/v0.10
cd .worktree/backport-1123-to-v0.10
git switch --create backport-1123-to-v0.10
git cherry-pick -x 78131cf44194efae54c95431d0bd52fa8490eab8 |
(cherry picked from commit 78131cf)
|
Successfully created backport PR for |
(cherry picked from commit 78131cf)
|
Successfully created backport PR for |
Description
Kubelet does not remove resources, even when they are no longer advertised from a device plugin (see: kubernetes/kubernetes#92396).
This change prevents node resources with count 0 from impacting KAI's scheduling logic.
Related Issues
Fixes #1120.
Checklist
Additional Notes