-
Notifications
You must be signed in to change notification settings - Fork 55
CUDA: add config tests\n\nAdds kernel-based tests for device-side rea… #736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…d-only access to CUDA config values: WARP_SIZE, MAX_THREADS_PER_BLOCK, and use in kernel control flow. Skips under cudasim. NumPy is used as the reference. Scope is intentionally limited to safe, well-defined CUDA config semantics.
Greptile SummaryThis PR adds tests for device-side CUDA config constants ( Critical issues:
What needs to happen: Confidence Score: 0/5
Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 3 comments
| def warp_size_kernel(out): | ||
| i = cuda.grid(1) | ||
| if i < out.size: | ||
| out[i] = cuda.config.WARP_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: cuda.config.WARP_SIZE is not implemented in the codebase. Searched the entire repository and found no device-side constant definition for this attribute. The cuda module's CudaModuleTemplate in cudadecl.py has no resolve_config method, and there's no mechanism to expose config values inside kernels.
| def max_threads_kernel(out): | ||
| i = cuda.grid(1) | ||
| if i < out.size: | ||
| out[i] = cuda.config.MAX_THREADS_PER_BLOCK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: cuda.config.MAX_THREADS_PER_BLOCK is not implemented. While driver.get_device().MAX_THREADS_PER_BLOCK exists for host-side access (see kernels/transpose.py:35), there's no device-side constant accessible within kernels.
| (d_inp, d_out), | ||
| inp.size, | ||
| ) | ||
| expected = inp * 2 if cuda.config.WARP_SIZE >= 32 else inp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: This line attempts host-side access to cuda.config.WARP_SIZE, but this also doesn't exist. The test references a non-existent API on both device and host sides.
|
@atmnp Could you please advise on the preferred approach for exposing these constants to CUDA kernels, or provide guidance on the implementation plan? Once the config attributes are available, I can update and validate these tests accordingly. |
CUDA: add config tests
This PR adds kernel-based tests for device-side read-only access to CUDA config values in Numba-CUDA:
cuda.config.WARP_SIZEcuda.config.MAX_THREADS_PER_BLOCKKey features:
This continues the systematic porting of CPU-side tests to CUDA, directly contributing to issue #515.