fix(binder): check active BindRequests before deleting reservation pods by enoodle · Pull Request #1104 · NVIDIA/KAI-Scheduler

enoodle · 2026-03-01T00:20:11Z

Description

Fixes a race condition where the binder's resource reservation sync logic (syncForPods) prematurely deletes GPU reservation pods during concurrent fractional GPU binding. The race occurs because the informer cache may show a newly-created reservation pod before the corresponding user pod's runai-gpu-group label patch has propagated, causing the sync to conclude no fraction pods exist for the GPU group and delete the reservation.

The fix adds a check for active (non-succeeded, non-failed) BindRequests referencing the GPU group before deleting a reservation pod. Since BindRequests are created before binding starts and persist until completion, they serve as a durable intent signal that prevents premature cleanup during the cache lag window.

Related Issues

Fixes #1103

Checklist

Self-reviewed
Added/updated tests (if needed)
Updated documentation (if needed)

Breaking Changes

None.

Additional Notes

Design Document

See docs/developer/designs/fix-reservation-pod-race/design.md for the full analysis.

Verification

With fix: 32/32 reservation pods survive concurrent binding
Without fix (cherry-picked test to origin/main): 31/32 survive — 1 reservation pod is prematurely deleted, confirming the race

Commits

Commit	Description
`aafe0f2d`	Initial envtest reproducing the race at scale
`366b3be2`	E2E stress test for concurrent fraction binding
`66b41b87`	Stabilized integration test with mock device plugin
`ee29169f`	The fix: check active BindRequests before deletion

Create an integration test using envtest that starts the full binder controller and reproduces the race condition where SyncForGpuGroup prematurely deletes reservation pods when the informer cache lags behind (fraction pods lack GPU group labels). Setup: 4 nodes x 8 GPUs = 32 GPU groups with pre-created reservation pods and BindRequests. The binder's SyncForNode sees reservation pods with no matching labeled fraction pods and deletes them, confirming the bug exists before the fix.

…fraction binding Creates many concurrent 0.5 fraction GPU pods under binpack mode across multiple rounds to reproduce the race where SyncForGpuGroup prematurely deletes reservation pods due to informer cache lag. Verified to fail reliably on the current main branch (5/64 pods stuck on second run).

…ondition Add reservation_race_test.go that reproduces the race where SyncForGpuGroup prematurely deletes reservation pods when the informer cache hasn't propagated GPU group labels on recently-bound fraction pods. The test creates the exact preconditions (reservation pod + active BindRequest, no labeled fraction pod) and calls SyncForGpuGroup to verify behavior. Also exports the resource reservation service in suite_test.go so the integration test can call SyncForGpuGroup directly.

coderabbitai · 2026-03-01T00:20:19Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch erez/fix-binder-reservation-pod-sync-race

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Before deleting a reservation pod during SyncForGpuGroup, check if any non-terminal BindRequests reference the GPU group. This prevents premature deletion caused by informer cache lag where the GPU group label on a recently-bound fraction pod has not yet propagated to the cache. The BindRequest serves as a durable intent signal that survives the cache lag window, making the sync logic idempotent against concurrent binding operations.

github-actions · 2026-03-01T01:05:24Z

Merging this branch will decrease overall coverage

Impacted Packages	Coverage Δ	🤖
github.com/NVIDIA/KAI-scheduler/pkg/binder/binding/resourcereservation	92.51% (-0.85%)	👎
github.com/NVIDIA/KAI-scheduler/pkg/binder/controllers/integration_tests	0.00% (ø)
github.com/NVIDIA/KAI-scheduler/pkg/env-tests	75.68% (ø)
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/resources	0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File	Coverage Δ	Total	Covered	Missed	🤖
github.com/NVIDIA/KAI-scheduler/pkg/binder/binding/resourcereservation/resource_reservation.go	92.51% (-0.85%)	227 (+16)	210 (+13)	17 (+3)	👎
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/resources/reservation_pod_race_specs.go	0.00% (ø)	0	0	0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

github.com/NVIDIA/KAI-scheduler/pkg/binder/binding/resourcereservation/resource_reservation_test.go
github.com/NVIDIA/KAI-scheduler/pkg/binder/controllers/integration_tests/reservation_race_test.go
github.com/NVIDIA/KAI-scheduler/pkg/binder/controllers/integration_tests/suite_test.go
github.com/NVIDIA/KAI-scheduler/pkg/env-tests/reservation_race_scale_test.go
github.com/NVIDIA/KAI-scheduler/test/e2e/suites/allocate/resources/resources_suite_test.go

davidLif

Do we need tests both on the env-test and the e2e levels? How long do they take?

davidLif · 2026-03-02T06:33:51Z

pkg/binder/binding/resourcereservation/resource_reservation.go

+			br.Status.Phase == schedulingv1alpha2.BindRequestPhaseFailed {
+			continue
+		}
+		if slices.Contains(br.Spec.SelectedGPUGroups, gpuGroup) {


Maybe you should add an index to easily get all binding requests for the gpu group?

github-actions · 2026-03-03T00:06:20Z

📊 Performance Benchmark Results

Comparing PR (erez/fix-binder-reservation-pod-sync-race) vs main branch:

main-bench.txt:155: parsing iteration count: invalid syntax
pr-bench.txt:155: parsing iteration count: invalid syntax
goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
                                    │ main-bench.txt │            pr-bench.txt            │
                                    │     sec/op     │    sec/op     vs base              │
AllocateAction_SmallCluster-4            108.9m ± 0%   108.9m ±  1%       ~ (p=0.937 n=6)
AllocateAction_MediumCluster-4           139.4m ± 3%   139.2m ±  1%       ~ (p=1.000 n=6)
AllocateAction_LargeCluster-4            234.5m ± 8%   234.8m ± 17%       ~ (p=1.000 n=6)
ReclaimAction_SmallCluster-4             103.1m ± 0%   103.2m ±  0%       ~ (p=0.065 n=6)
ReclaimAction_MediumCluster-4            106.3m ± 0%   106.3m ±  0%       ~ (p=0.937 n=6)
PreemptAction_SmallCluster-4             103.9m ± 0%   103.9m ±  0%       ~ (p=0.394 n=6)
PreemptAction_MediumCluster-4            114.4m ± 0%   114.3m ±  0%       ~ (p=0.180 n=6)
ConsolidationAction_SmallCluster-4       115.3m ± 0%   115.1m ±  1%       ~ (p=0.132 n=6)
ConsolidationAction_MediumCluster-4      212.8m ± 2%   214.0m ±  2%       ~ (p=0.240 n=6)
FullSchedulingCycle_SmallCluster-4       105.9m ± 0%   105.8m ±  0%       ~ (p=0.699 n=6)
FullSchedulingCycle_MediumCluster-4      121.5m ± 1%   121.9m ±  1%       ~ (p=0.180 n=6)
FullSchedulingCycle_LargeCluster-4       163.8m ± 1%   165.3m ±  2%       ~ (p=0.093 n=6)
ManyQueues_MediumCluster-4               144.6m ± 1%   145.5m ±  2%       ~ (p=0.240 n=6)
GangScheduling_MediumCluster-4           163.9m ± 2%   165.7m ±  2%  +1.07% (p=0.041 n=6)
geomean                                  133.4m        133.8m        +0.24%

                                    │ main-bench.txt │            pr-bench.txt            │
                                    │      B/op      │     B/op      vs base              │
AllocateAction_SmallCluster-4           2.298Mi ± 0%   2.297Mi ± 0%       ~ (p=0.937 n=6)
AllocateAction_MediumCluster-4          12.42Mi ± 0%   12.42Mi ± 0%       ~ (p=0.699 n=6)
AllocateAction_LargeCluster-4           42.96Mi ± 0%   42.97Mi ± 0%       ~ (p=0.310 n=6)
ReclaimAction_SmallCluster-4            973.2Ki ± 0%   967.4Ki ± 1%       ~ (p=0.093 n=6)
ReclaimAction_MediumCluster-4           3.161Mi ± 0%   3.159Mi ± 0%       ~ (p=1.000 n=6)
PreemptAction_SmallCluster-4            1.066Mi ± 1%   1.065Mi ± 1%       ~ (p=0.818 n=6)
PreemptAction_MediumCluster-4           4.263Mi ± 0%   4.263Mi ± 0%       ~ (p=0.240 n=6)
ConsolidationAction_SmallCluster-4      5.841Mi ± 0%   5.831Mi ± 0%  -0.16% (p=0.002 n=6)
ConsolidationAction_MediumCluster-4     48.16Mi ± 0%   48.16Mi ± 0%       ~ (p=1.000 n=6)
FullSchedulingCycle_SmallCluster-4      1.486Mi ± 1%   1.489Mi ± 0%       ~ (p=0.699 n=6)
FullSchedulingCycle_MediumCluster-4     7.287Mi ± 0%   7.287Mi ± 0%       ~ (p=0.240 n=6)
FullSchedulingCycle_LargeCluster-4      23.94Mi ± 0%   23.94Mi ± 0%       ~ (p=0.394 n=6)
ManyQueues_MediumCluster-4              16.88Mi ± 0%   16.88Mi ± 0%       ~ (p=0.589 n=6)
GangScheduling_MediumCluster-4          18.17Mi ± 0%   18.17Mi ± 0%       ~ (p=0.589 n=6)
geomean                                 6.710Mi        6.705Mi       -0.06%

                                    │ main-bench.txt │           pr-bench.txt            │
                                    │   allocs/op    │  allocs/op   vs base              │
AllocateAction_SmallCluster-4            37.07k ± 0%   37.07k ± 0%       ~ (p=0.145 n=6)
AllocateAction_MediumCluster-4           328.6k ± 0%   328.6k ± 0%       ~ (p=0.372 n=6)
AllocateAction_LargeCluster-4            1.403M ± 0%   1.403M ± 0%       ~ (p=0.310 n=6)
ReclaimAction_SmallCluster-4             8.915k ± 0%   8.911k ± 0%       ~ (p=0.455 n=6)
ReclaimAction_MediumCluster-4            28.57k ± 0%   28.57k ± 0%       ~ (p=0.751 n=6)
PreemptAction_SmallCluster-4             11.60k ± 0%   11.60k ± 0%       ~ (p=0.736 n=6)
PreemptAction_MediumCluster-4            40.37k ± 0%   40.37k ± 0%       ~ (p=0.721 n=6)
ConsolidationAction_SmallCluster-4       74.96k ± 0%   74.92k ± 0%  -0.06% (p=0.026 n=6)
ConsolidationAction_MediumCluster-4      693.1k ± 0%   693.1k ± 0%       ~ (p=0.818 n=6)
FullSchedulingCycle_SmallCluster-4       22.05k ± 0%   22.05k ± 0%       ~ (p=0.972 n=6)
FullSchedulingCycle_MediumCluster-4      177.4k ± 0%   177.4k ± 0%       ~ (p=0.262 n=6)
FullSchedulingCycle_LargeCluster-4       733.9k ± 0%   733.9k ± 0%       ~ (p=0.418 n=6)
ManyQueues_MediumCluster-4               366.7k ± 0%   366.7k ± 0%       ~ (p=0.905 n=6)
GangScheduling_MediumCluster-4           603.0k ± 0%   603.0k ± 0%       ~ (p=0.723 n=6)
geomean                                  114.6k        114.6k       -0.01%

pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions/integration_tests/reclaim
                            │ main-bench.txt │            pr-bench.txt             │
                            │     sec/op     │    sec/op      vs base              │
ReclaimLargeJobs_10Node-4      105.3m ± 0%     105.3m ± 0%         ~ (p=0.485 n=6)
ReclaimLargeJobs_50Node-4      145.2m ± 1%     145.0m ± 1%         ~ (p=0.937 n=6)
ReclaimLargeJobs_100Node-4     296.1m ± 1%     301.5m ± 3%         ~ (p=0.093 n=6)
ReclaimLargeJobs_200Node-4      1.231 ± 2%      1.251 ± 7%         ~ (p=0.310 n=6)
ReclaimLargeJobs_500Node-4      13.93 ± 3%      13.96 ± 3%         ~ (p=0.485 n=6)
ReclaimLargeJobs_1000Node-4     118.5 ±  ∞ ¹    117.8 ±  ∞ ¹       ~ (p=0.486 n=4)
geomean                         1.447           1.455         +0.49%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ main-bench.txt │             pr-bench.txt             │
                            │      B/op      │      B/op       vs base              │
ReclaimLargeJobs_10Node-4     1.882Mi ± 3%     1.878Mi ± 3%         ~ (p=0.818 n=6)
ReclaimLargeJobs_50Node-4     17.51Mi ± 0%     17.52Mi ± 0%         ~ (p=0.310 n=6)
ReclaimLargeJobs_100Node-4    59.59Mi ± 0%     59.60Mi ± 0%         ~ (p=0.589 n=6)
ReclaimLargeJobs_200Node-4    235.5Mi ± 0%     235.5Mi ± 0%         ~ (p=0.699 n=6)
ReclaimLargeJobs_500Node-4    1.704Gi ± 0%     1.704Gi ± 0%         ~ (p=0.818 n=6)
ReclaimLargeJobs_1000Node-4   8.958Gi ±  ∞ ¹   8.959Gi ±  ∞ ¹       ~ (p=0.114 n=4)
geomean                       139.6Mi          139.6Mi         -0.02%
¹ need >= 6 samples for confidence interval at level 0.95

                            │ main-bench.txt │            pr-bench.txt             │
                            │   allocs/op    │   allocs/op    vs base              │
ReclaimLargeJobs_10Node-4      20.24k ± 3%     20.24k ± 3%         ~ (p=0.985 n=6)
ReclaimLargeJobs_50Node-4      232.9k ± 0%     232.9k ± 0%         ~ (p=1.000 n=6)
ReclaimLargeJobs_100Node-4     866.7k ± 0%     866.7k ± 0%         ~ (p=0.615 n=6)
ReclaimLargeJobs_200Node-4     3.659M ± 0%     3.659M ± 0%         ~ (p=0.937 n=6)
ReclaimLargeJobs_500Node-4     29.28M ± 0%     29.28M ± 0%         ~ (p=1.000 n=6)
ReclaimLargeJobs_1000Node-4    163.0M ±  ∞ ¹   163.0M ±  ∞ ¹       ~ (p=0.486 n=4)
geomean                        2.036M          2.036M         -0.00%
¹ need >= 6 samples for confidence interval at level 0.95

Legend

📉 Negative delta = Performance improvement (faster)
📈 Positive delta = Performance regression (slower)
p-value < 0.05 indicates statistically significant change

Raw benchmark data

PR branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 109171861 ns/op	 2417500 B/op	   37072 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108925434 ns/op	 2407824 B/op	   37074 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 109948018 ns/op	 2406285 B/op	   37068 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108808720 ns/op	 2407956 B/op	   37070 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108742152 ns/op	 2412494 B/op	   37067 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108513990 ns/op	 2409020 B/op	   37070 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 140736640 ns/op	13022749 B/op	  328602 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138225574 ns/op	13021647 B/op	  328596 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 139631676 ns/op	13021741 B/op	  328598 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 140985940 ns/op	13021787 B/op	  328594 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138366967 ns/op	13024318 B/op	  328596 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138805628 ns/op	13021454 B/op	  328594 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 237524897 ns/op	45078792 B/op	 1402677 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 224049346 ns/op	45078470 B/op	 1402677 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 222462889 ns/op	45055820 B/op	 1402676 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 235913111 ns/op	45050150 B/op	 1402679 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 233675854 ns/op	45046915 B/op	 1402660 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 274049596 ns/op	45050096 B/op	 1402683 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103334404 ns/op	  983212 B/op	    8880 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103269785 ns/op	  992754 B/op	    8904 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	     100	 103456291 ns/op	  985846 B/op	    8909 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103055987 ns/op	  996376 B/op	    8915 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103224034 ns/op	  988538 B/op	    8912 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103125586 ns/op	  996372 B/op	    8915 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106647256 ns/op	 3319445 B/op	   28567 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106025102 ns/op	 3312834 B/op	   28567 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106412199 ns/op	 3308765 B/op	   28565 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106293286 ns/op	 3316850 B/op	   28569 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106369237 ns/op	 3312909 B/op	   28568 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106161060 ns/op	 3312778 B/op	   28566 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 104001457 ns/op	 1115975 B/op	   11599 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103919653 ns/op	 1116213 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 104004354 ns/op	 1116255 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103957873 ns/op	 1120079 B/op	   11602 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103933322 ns/op	 1112327 B/op	   11600 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103898390 ns/op	 1129112 B/op	   11602 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114007956 ns/op	 4469888 B/op	   40374 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 113902016 ns/op	 4469926 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114479852 ns/op	 4469892 B/op	   40374 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114410975 ns/op	 4469922 B/op	   40375 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114245130 ns/op	 4473853 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114458725 ns/op	 4474308 B/op	   40375 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114474453 ns/op	 6113074 B/op	   74893 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 115097382 ns/op	 6116662 B/op	   74928 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114974364 ns/op	 6114239 B/op	   74931 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 115027564 ns/op	 6114968 B/op	   74908 allocs/op

Main branch:

goos: linux
goarch: amd64
pkg: github.com/NVIDIA/KAI-scheduler/pkg/scheduler/actions
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkAllocateAction_SmallCluster-4         	      10	 109320957 ns/op	 2406936 B/op	   37072 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108871910 ns/op	 2409764 B/op	   37076 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108895070 ns/op	 2419232 B/op	   37075 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108955207 ns/op	 2409322 B/op	   37073 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108598608 ns/op	 2404924 B/op	   37064 allocs/op
BenchmarkAllocateAction_SmallCluster-4         	      10	 108919156 ns/op	 2410526 B/op	   37075 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137449966 ns/op	13023267 B/op	  328603 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 138452805 ns/op	13021641 B/op	  328597 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 142261153 ns/op	13020825 B/op	  328594 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 143193859 ns/op	13021817 B/op	  328594 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 140341281 ns/op	13023388 B/op	  328592 allocs/op
BenchmarkAllocateAction_MediumCluster-4        	       8	 137990090 ns/op	13020594 B/op	  328591 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 233733982 ns/op	45051280 B/op	 1402692 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 235742178 ns/op	45049294 B/op	 1402678 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 227254459 ns/op	45057224 B/op	 1402690 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 253235500 ns/op	45049865 B/op	 1402681 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 231262221 ns/op	45049299 B/op	 1402677 allocs/op
BenchmarkAllocateAction_LargeCluster-4         	       5	 235216726 ns/op	45048688 B/op	 1402672 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103118045 ns/op	  991815 B/op	    8890 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103063742 ns/op	  996805 B/op	    8906 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103126503 ns/op	  996261 B/op	    8914 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103068516 ns/op	 1001284 B/op	    8915 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103003969 ns/op	 1000250 B/op	    8916 allocs/op
BenchmarkReclaimAction_SmallCluster-4          	      10	 103058135 ns/op	  996279 B/op	    8915 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106651803 ns/op	 3312864 B/op	   28567 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106189798 ns/op	 3316698 B/op	   28568 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106301591 ns/op	 3316823 B/op	   28569 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106139397 ns/op	 3312804 B/op	   28567 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106406710 ns/op	 3309008 B/op	   28565 allocs/op
BenchmarkReclaimAction_MediumCluster-4         	      10	 106322975 ns/op	 3316690 B/op	   28568 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103952074 ns/op	 1112207 B/op	   11599 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103855157 ns/op	 1120144 B/op	   11602 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103998061 ns/op	 1122924 B/op	   11602 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103852110 ns/op	 1108276 B/op	   11598 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103966571 ns/op	 1119917 B/op	   11601 allocs/op
BenchmarkPreemptAction_SmallCluster-4          	      10	 103906643 ns/op	 1116015 B/op	   11600 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114485190 ns/op	 4469679 B/op	   40373 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114354513 ns/op	 4469584 B/op	   40372 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114626256 ns/op	 4469887 B/op	   40374 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114608886 ns/op	 4474091 B/op	   40375 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114353971 ns/op	 4465225 B/op	   40371 allocs/op
BenchmarkPreemptAction_MediumCluster-4         	       9	 114354732 ns/op	 4474256 B/op	   40376 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 114934533 ns/op	 6124111 B/op	   74989 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 115452941 ns/op	 6125533 B/op	   74926 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 115568969 ns/op	 6120070 B/op	   74960 allocs/op
BenchmarkConsolidationAction_SmallCluster-4    	       9	 115140157 ns/op	 6116955 B/op	   74936 allocs/op

enoodle added 3 commits March 1, 2026 01:14

enoodle marked this pull request as draft March 1, 2026 00:21

enoodle force-pushed the erez/fix-binder-reservation-pod-sync-race branch from ee29169 to 97a03f4 Compare March 1, 2026 00:31

davidLif reviewed Mar 2, 2026

View reviewed changes

refactor: add bind request indexer by GPU groups

0266eee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(binder): check active BindRequests before deleting reservation pods#1104

fix(binder): check active BindRequests before deleting reservation pods#1104
enoodle wants to merge 5 commits intomainfrom
erez/fix-binder-reservation-pod-sync-race

enoodle commented Mar 1, 2026

Uh oh!

coderabbitai bot commented Mar 1, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions bot commented Mar 1, 2026

Changed files (no unit tests)

Changed unit test files

Uh oh!

davidLif left a comment

Uh oh!

davidLif Mar 2, 2026

Uh oh!

enoodle Mar 2, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

enoodle commented Mar 1, 2026

Description

Related Issues

Checklist

Breaking Changes

Additional Notes

Design Document

Verification

Commits

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions bot commented Mar 1, 2026

Merging this branch will decrease overall coverage

Changed files (no unit tests)

Changed unit test files

Uh oh!

davidLif left a comment

Choose a reason for hiding this comment

Uh oh!

davidLif Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

enoodle Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 3, 2026

📊 Performance Benchmark Results

Legend

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 1, 2026 •

edited

Loading