Fix max block size computation in `forall` #744

brandon-b-miller · 2026-01-23T18:06:25Z

PR #609 made some changes to the way modules were loaded that results in the wrong object being passed to cuOccupancyMaxPotentialBlockSize (previously a CUFunction and now a CUKernel). This causes the max block size calculation to fail after eventually getting the wrong object and leads to a CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES on certain GPUs. This is observable on a V100 with a resource hungry kernel:

python -m numba.runtests numba.cuda.tests.cudapy.test_gufunc.TestCUDAGufunc.test_gufunc_small

cuda.core._utils.cuda_utils.CUDAError: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: This indicates that a launch did not occur because it did not have appropriate resources.

This PR removes the numba-cuda native maximum threads per block computation machinery and routes through cuda-python APIs to get the same information.

copy-pr-bot · 2026-01-23T18:06:29Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

brandon-b-miller · 2026-01-23T18:06:42Z

/ok to test

greptile-apps · 2026-01-23T18:10:35Z

Greptile Summary

This PR fixes a critical bug where cuOccupancyMaxPotentialBlockSize was being passed a CUKernel object instead of a CUFunction, causing CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES on certain GPUs. The fix simplifies the implementation by removing the custom occupancy calculation machinery and instead directly using kernel.attributes.max_threads_per_block() from cuda-python.

Key Changes:

Removed get_max_potential_block_size() method and its helper implementations from Context class
Updated ForAll._compute_thread_per_block() to use function.kernel.attributes.max_threads_per_block() directly
Removed corresponding test code for the deleted method
Cleaned up unused imports (c_size_t, cu_occupancy_b2d_size)

Trade-off: The new approach uses the maximum allowable threads per block rather than calculating an optimal block size for occupancy. While this may not achieve optimal occupancy in all cases, it fixes the immediate bug and simplifies the codebase by routing through cuda-python APIs.

Confidence Score: 4/5

This PR is safe to merge - it fixes a critical runtime bug with a pragmatic solution
The fix resolves a real production issue (CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES). The implementation is straightforward and the test was properly updated. Score is 4/5 rather than 5/5 because the semantic change from optimal occupancy to maximum block size may impact performance in some workloads, though this appears to be an acceptable trade-off given the bug being fixed.
No files require special attention - all changes are clean and well-structured

Important Files Changed

Filename	Overview
numba_cuda/numba/cuda/cudadrv/driver.py	Removes broken `get_max_potential_block_size` method and unused imports
numba_cuda/numba/cuda/dispatcher.py	Replaces occupancy API with direct `max_threads_per_block()` call - simpler but may not optimize for occupancy
numba_cuda/numba/cuda/tests/cudadrv/test_cuda_driver.py	Removes test for deleted `get_max_potential_block_size` method

greptile-apps

_{2 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

numba_cuda/numba/cuda/cudadrv/driver.py

brandon-b-miller · 2026-01-23T18:16:52Z

/ok to test

cpcloud

Nice!

cpcloud · 2026-01-23T18:59:26Z

It looks like this is still in use in cudf. Perhaps we can just fix it as is and keep it around until it can be adjusted downstream in cudf?

use cuda-python sizes

8427ced

greptile-apps bot reviewed Jan 23, 2026

View reviewed changes

numba_cuda/numba/cuda/cudadrv/driver.py Show resolved Hide resolved

remove tests

a7b92b4

cpcloud approved these changes Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix max block size computation in `forall` #744

Fix max block size computation in `forall` #744

Uh oh!

brandon-b-miller commented Jan 23, 2026

Uh oh!

copy-pr-bot bot commented Jan 23, 2026

Uh oh!

brandon-b-miller commented Jan 23, 2026

Uh oh!

greptile-apps bot commented Jan 23, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

brandon-b-miller commented Jan 23, 2026

Uh oh!

cpcloud left a comment

Uh oh!

cpcloud commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix max block size computation in forall #744

Are you sure you want to change the base?

Fix max block size computation in forall #744

Uh oh!

Conversation

brandon-b-miller commented Jan 23, 2026

Uh oh!

copy-pr-bot bot commented Jan 23, 2026

Uh oh!

brandon-b-miller commented Jan 23, 2026

Uh oh!

greptile-apps bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brandon-b-miller commented Jan 23, 2026

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix max block size computation in `forall` #744

Fix max block size computation in `forall` #744

greptile-apps bot commented Jan 23, 2026 •

edited

Loading