Skip to content

fix(scheduler): Do not include resources with a count of 0.#1123

Merged
enoodle merged 4 commits intoNVIDIA:mainfrom
dougnd:fix-mig-detection
Mar 4, 2026
Merged

fix(scheduler): Do not include resources with a count of 0.#1123
enoodle merged 4 commits intoNVIDIA:mainfrom
dougnd:fix-mig-detection

Conversation

@dougnd
Copy link
Contributor

@dougnd dougnd commented Mar 2, 2026

Description

Kubelet does not remove resources, even when they are no longer advertised from a device plugin (see: kubernetes/kubernetes#92396).

This change prevents node resources with count 0 from impacting KAI's scheduling logic.

Related Issues

Fixes #1120.

Checklist

Note: Ensure your PR title follows the Conventional Commits format (e.g., feat(scheduler): add new feature)

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated documentation (if needed)

Additional Notes

dougnd added 2 commits March 2, 2026 08:39
Kubelet does not remove resources, even when they are no longer advertised from
a device plugin (see: kubernetes/kubernetes#92396).
This change prevents removed devices from impacting KAI's scheduling logic.

Fixes NVIDIA#1120
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Merging this branch will decrease overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/resource_info 50.52% (-0.21%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/resource_info/resource_info.go 2.41% (-0.06%) 83 (+2) 2 81 (+2) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Copy link
Collaborator

@enoodle enoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test for this please?

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Merging this branch will increase overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/resource_info 52.62% (+1.88%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/resource_info/resource_info.go 13.25% (+10.78%) 83 (+2) 11 (+9) 72 (-7) 🎉

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/NVIDIA/KAI-scheduler/pkg/scheduler/api/resource_info/resource_info_test.go

@dougnd
Copy link
Contributor Author

dougnd commented Mar 4, 2026

Thanks for the review! Let me know if there is anything else I should change.

I noticed the Performance benchmarks seem to fail on my PR. In the benckmark workflow (.github/workflows/benchmark.yaml), it looks like it:

  1. Checks out PR branch using:
/usr/bin/git checkout --progress --force refs/remotes/pull/1123/merge
  1. Runs benchmark
  2. Checks out main
  3. Runs benchmark
  4. Checkout PR branch (restore), this time using:
git checkout fix-mig-detection

This fails with error: pathspec 'fix-mig-detection' did not match any file(s) known to git.

Perhaps this line could be changed to:

run: git checkout ${{ github.ref || github.sha }}

I believe this would then run the following instead:

git checkout  refs/remotes/pull/1123/merge

Or is this failure is expected for PRs from forks?

@enoodle
Copy link
Collaborator

enoodle commented Mar 4, 2026

Thanks, We should fix this action on another PR.

@enoodle enoodle added this pull request to the merge queue Mar 4, 2026
Merged via the queue into NVIDIA:main with commit 78131cf Mar 4, 2026
15 of 19 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 4, 2026
@KaiPilotBot
Copy link

Successfully created backport PR for v0.9:

github-actions bot pushed a commit that referenced this pull request Mar 4, 2026
@KaiPilotBot
Copy link

Successfully created backport PR for v0.6:

@KaiPilotBot
Copy link

Successfully created backport PR for v0.4:

github-actions bot pushed a commit that referenced this pull request Mar 4, 2026
@KaiPilotBot
Copy link

Backport failed for v0.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin v0.10
git worktree add -d .worktree/backport-1123-to-v0.10 origin/v0.10
cd .worktree/backport-1123-to-v0.10
git switch --create backport-1123-to-v0.10
git cherry-pick -x 78131cf44194efae54c95431d0bd52fa8490eab8

github-actions bot pushed a commit that referenced this pull request Mar 4, 2026
@KaiPilotBot
Copy link

Successfully created backport PR for v0.12:

@KaiPilotBot
Copy link

Successfully created backport PR for v0.13:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow non-MIG jobs to run on node where MIG is disabed, but set to MigStrategy='mixed'

4 participants