-
Notifications
You must be signed in to change notification settings - Fork 55
Move CallConv from CUDAContext to FunctionDescriptor
#717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Greptile SummaryThis PR successfully refactors the calling convention architecture by moving it from a global context-level property to a per-function property stored in Key Changes:
Impact: Confidence Score: 5/5
Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 4 comments
|
|
||
| @skip_under_cuda_memcheck("Hangs cuda-memcheck") | ||
| @skip_on_cudasim("disabled for cudasim") | ||
| @unittest.skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test skipped without explanation - check if the refactoring broke this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the multithread / multigpu test were suspected to be not tested properly in the current CI system as they are all single-GPU runners. They show up as failures in my local environment but was not related this PR.
| ) | ||
| resty = cres.signature.return_type | ||
|
|
||
| if resty and not device and resty != types.void: | ||
| raise TypeError("CUDA kernel must have void return type.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed cabi_wrap_function call for device functions with C ABI. Verify this was intentional and that device functions with abi="c" still work correctly without the wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed intentional, because function / kernels can bare its own calling convention, rather than using a target-wide calling convention. cabi_wrap_function was meant to patch a function with Numba ABI to a caller that expects a CABI function. The fix in this PR is imply switching the kernel to bare a CUDACABICallConv and should work as expected.
Additional Comments (1)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Additional Comments (1)
|
CallConv from CUDAContext to FunctionDescriptor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Additional Comments (1)
|
|
/ok to test |
|
/ok to test 855b7de |
numba_cuda/numba/cuda/target.py
Outdated
| @property | ||
| def call_conv(self): | ||
| return CUDACallConv(self) | ||
| return self.fndesc.call_conv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully we can delete this property now, and all uses of call_conv should come from fndesc directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is moved in 684ad4f. Though in many cases it's still a pass through via context.fndesc.call_conv.
| subtargetoptions["fastmath"] = flags.fastmath | ||
| error_model = callconv.create_error_model(flags.error_model, targetctx) | ||
|
|
||
| # FIXME: should update everywhere uses error_model to use callconv from fundesc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably if we delete the callconv property from the context, that should show up any places where we're not using the fndesc for the callconv, and then it should be OK to remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit missing the connection here. I can see the first half - when we remove directly using the context property we can see all places that we don't directly use fndesc. But for error_model, how does it connect to where fndesc's call site?
gmarkall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of the design, this is pretty much 90% of the way there. I made a couple of comments on how to resolve the XXX / FIXME items, and had one other suggestion for where the mangler should live - generally I'd think of mangling as part of the calling convention anyway.
I think with the resolution of the items noted on the diff, the design of this should be good to merge (give or take any minor tweaks following the changes).
|
@gmarkall I believe I addressed most of the review comments above. A few things that still stands out to me:
|
|
/ok to test 864a40c |
Today, calling conventions are defined globally per compilation context. This makes it hard to switch flexibly between the Numba ABI and the C ABI when declaring external functions. It also explains the need for the kernel “fixup” logic: CUDA kernels are fundamentally C-ABI, but have historically been forced through the Numba ABI path.
This PR moves calling-convention selection to a more granular level, removing these limitations and eliminating the kernel fixup workaround. It also lays the groundwork for users to plug in additional calling-convention implementations in the future.