Skip to content

Conversation

@CodersAcademy006
Copy link

CUDA: add config tests

This PR adds kernel-based tests for device-side read-only access to CUDA config values in Numba-CUDA:

  • cuda.config.WARP_SIZE
  • cuda.config.MAX_THREADS_PER_BLOCK
  • Use of config values in kernel control flow

Key features:

  • Tests are skipped under cudasim due to backend-specific semantics
  • NumPy is used as the reference oracle
  • Scope is intentionally limited to safe, well-defined CUDA config semantics (no mutation or non-CUDA targets)

This continues the systematic porting of CPU-side tests to CUDA, directly contributing to issue #515.

…d-only access to CUDA config values: WARP_SIZE, MAX_THREADS_PER_BLOCK, and use in kernel control flow. Skips under cudasim. NumPy is used as the reference. Scope is intentionally limited to safe, well-defined CUDA config semantics.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 20, 2026

Greptile Summary

This PR adds tests for device-side CUDA config constants (cuda.config.WARP_SIZE and cuda.config.MAX_THREADS_PER_BLOCK) that do not exist in the codebase. The implementation is missing.

Critical issues:

  • cuda.config.WARP_SIZE and cuda.config.MAX_THREADS_PER_BLOCK are not defined anywhere
  • No resolve_config method in CudaModuleTemplate (cudadecl.py:461)
  • The only existing references are CU_DEVICE_ATTRIBUTE_WARP_SIZE in enums.py (driver constant) and driver.get_device().MAX_THREADS_PER_BLOCK in transpose.py (host-side API)
  • All three tests will fail immediately with AttributeError when trying to access these non-existent attributes

What needs to happen:
The implementation PR must be merged first, or this PR needs to include the implementation alongside the tests.

Confidence Score: 0/5

  • This PR cannot be merged - it tests functionality that doesn't exist and will immediately fail
  • The PR adds tests for cuda.config.WARP_SIZE and cuda.config.MAX_THREADS_PER_BLOCK, but these constants are not implemented anywhere in the codebase. Comprehensive search confirms no device-side or host-side implementation exists. Tests will fail with AttributeError.
  • The test file requires a corresponding implementation PR to be merged first, or this PR must include the implementation

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/tests/test_config.py Tests reference non-existent cuda.config.WARP_SIZE and cuda.config.MAX_THREADS_PER_BLOCK - these device-side constants are not implemented anywhere in the codebase

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

def warp_size_kernel(out):
i = cuda.grid(1)
if i < out.size:
out[i] = cuda.config.WARP_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: cuda.config.WARP_SIZE is not implemented in the codebase. Searched the entire repository and found no device-side constant definition for this attribute. The cuda module's CudaModuleTemplate in cudadecl.py has no resolve_config method, and there's no mechanism to expose config values inside kernels.

def max_threads_kernel(out):
i = cuda.grid(1)
if i < out.size:
out[i] = cuda.config.MAX_THREADS_PER_BLOCK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: cuda.config.MAX_THREADS_PER_BLOCK is not implemented. While driver.get_device().MAX_THREADS_PER_BLOCK exists for host-side access (see kernels/transpose.py:35), there's no device-side constant accessible within kernels.

(d_inp, d_out),
inp.size,
)
expected = inp * 2 if cuda.config.WARP_SIZE >= 32 else inp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: This line attempts host-side access to cuda.config.WARP_SIZE, but this also doesn't exist. The test references a non-existent API on both device and host sides.

@CodersAcademy006
Copy link
Author

@atmnp Could you please advise on the preferred approach for exposing these constants to CUDA kernels, or provide guidance on the implementation plan? Once the config attributes are available, I can update and validate these tests accordingly.
Thank You!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant