Skip to content

Conversation

@isVoid
Copy link
Contributor

@isVoid isVoid commented Jan 13, 2026

Today, calling conventions are defined globally per compilation context. This makes it hard to switch flexibly between the Numba ABI and the C ABI when declaring external functions. It also explains the need for the kernel “fixup” logic: CUDA kernels are fundamentally C-ABI, but have historically been forced through the Numba ABI path.

This PR moves calling-convention selection to a more granular level, removing these limitations and eliminating the kernel fixup workaround. It also lays the groundwork for users to plug in additional calling-convention implementations in the future.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 13, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Greptile Summary

This PR successfully refactors the calling convention architecture by moving it from a global context-level property to a per-function property stored in FunctionDescriptor. This enables flexible selection between Numba ABI and C ABI at the function level rather than globally per compilation context.

Key Changes:

  • Moved CUDACallConv and CUDACABICallConv from target.py to core/callconv.py
  • Added call_conv and abi_info fields to FunctionDescriptor with a new declare_function method that uses the per-function calling convention
  • Removed the global call_conv property from CUDATargetContext
  • Eliminated the cabi_wrap_function workaround - functions now directly use their specified calling convention
  • Updated all call sites throughout the codebase to access call_conv through context.fndesc.call_conv instead of context.call_conv
  • Added comprehensive test coverage for C ABI device functions with various signatures (0-arg, 1-arg, 2-arg, void return, pointer arguments)
  • Added abi parameter to declare_device to support declaring external C ABI functions

Impact:
This architectural change removes the need for kernel fixup logic and provides a cleaner, more flexible way to handle different calling conventions. The changes are systematic and thorough, updating all references consistently across the codebase.

Confidence Score: 5/5

  • This PR is safe to merge with high confidence
  • The refactoring is well-architected, systematically updates all references, includes comprehensive test coverage for the new C ABI functionality, and successfully eliminates technical debt (the cabi_wrap_function workaround). All changes are consistent and follow clear architectural principles.
  • No files require special attention

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/core/funcdesc.py Added call_conv and abi_info fields to FunctionDescriptor, implemented declare_function method that uses the per-function calling convention
numba_cuda/numba/cuda/core/callconv.py Moved CUDACallConv and CUDACABICallConv from target.py to callconv.py, added C ABI mangler implementation
numba_cuda/numba/cuda/target.py Removed call_conv property and calling convention classes (moved to callconv.py)
numba_cuda/numba/cuda/compiler.py Removed cabi_wrap_function, added abi and abi_info parameters to compilation functions, set call_conv in flags for C ABI
numba_cuda/numba/cuda/flags.py Added call_conv and abi_info options to CUDAFlags
numba_cuda/numba/cuda/core/typed_passes.py Updated to get call_conv from flags, pass it to FunctionDescriptor creation
numba_cuda/numba/cuda/lowering.py Changed call_conv property to use fndesc.call_conv, updated function declaration to use fndesc.declare_function
numba_cuda/numba/cuda/core/imputils.py Updated user_function and user_generator to use fndesc.call_conv and fndesc.declare_function, added null check for status

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile


@skip_under_cuda_memcheck("Hangs cuda-memcheck")
@skip_on_cudasim("disabled for cudasim")
@unittest.skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test skipped without explanation - check if the refactoring broke this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the multithread / multigpu test were suspected to be not tested properly in the current CI system as they are all single-GPU runners. They show up as failures in my local environment but was not related this PR.

Comment on lines 1129 to 1133
)
resty = cres.signature.return_type

if resty and not device and resty != types.void:
raise TypeError("CUDA kernel must have void return type.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed cabi_wrap_function call for device functions with C ABI. Verify this was intentional and that device functions with abi="c" still work correctly without the wrapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed intentional, because function / kernels can bare its own calling convention, rather than using a target-wide calling convention. cabi_wrap_function was meant to patch a function with Numba ABI to a caller that expects a CABI function. The fix in this PR is imply switching the kernel to bare a CUDACABICallConv and should work as expected.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
Bug: status can be None when using C ABI calling convention (see line 222 check and CUDACABICallConv.call_function which returns status = None), but this line tries to access status.is_none, which will raise AttributeError.

        if status is not None and builder.not_(status.is_none):

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
status can be None when using C ABI (see line 222 check and CUDACABICallConv.call_function returns status = None), causing AttributeError when accessing status.is_none

        if status is not None and builder.not_(status.is_none):

@isVoid isVoid changed the title Move CallConv from CUDAContext to FunctionDescriptor Move CallConv from CUDAContext to FunctionDescriptor Jan 13, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 13, 2026

Additional Comments (1)

numba_cuda/numba/cuda/core/imputils.py
status can be None when using C ABI calling convention (see line 222 check and CUDACABICallConv.call_function which returns status = None in callconv.py line 417), causing AttributeError when accessing status.is_none

        if status is not None and builder.not_(status.is_none):

@gmarkall
Copy link
Contributor

/ok to test

@gmarkall gmarkall added the 3 - Ready for Review Ready for review by team label Jan 15, 2026
@isVoid
Copy link
Contributor Author

isVoid commented Jan 20, 2026

/ok to test 855b7de

Comment on lines 268 to 270
@property
def call_conv(self):
return CUDACallConv(self)
return self.fndesc.call_conv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we can delete this property now, and all uses of call_conv should come from fndesc directly?

Copy link
Contributor Author

@isVoid isVoid Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is moved in 684ad4f. Though in many cases it's still a pass through via context.fndesc.call_conv.

subtargetoptions["fastmath"] = flags.fastmath
error_model = callconv.create_error_model(flags.error_model, targetctx)

# FIXME: should update everywhere uses error_model to use callconv from fundesc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably if we delete the callconv property from the context, that should show up any places where we're not using the fndesc for the callconv, and then it should be OK to remove this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit missing the connection here. I can see the first half - when we remove directly using the context property we can see all places that we don't directly use fndesc. But for error_model, how does it connect to where fndesc's call site?

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the design, this is pretty much 90% of the way there. I made a couple of comments on how to resolve the XXX / FIXME items, and had one other suggestion for where the mangler should live - generally I'd think of mangling as part of the calling convention anyway.

I think with the resolution of the items noted on the diff, the design of this should be good to merge (give or take any minor tweaks following the changes).

@gmarkall gmarkall added 4 - Waiting on author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team labels Jan 20, 2026
@isVoid
Copy link
Contributor Author

isVoid commented Jan 22, 2026

@gmarkall I believe I addressed most of the review comments above. A few things that still stands out to me:

  • The handling of error model is still arbitrated to CUDACallConv, not CUDACABICallConv.
  • A few other functions in BaseContext like call_internal and call_internal_no_propagate have similar pattern to declare_function. Should they also get moved to FunctionDescriptor?

@isVoid isVoid requested a review from gmarkall January 22, 2026 21:50
@isVoid
Copy link
Contributor Author

isVoid commented Jan 22, 2026

/ok to test 864a40c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4 - Waiting on author Waiting for author to respond to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants