-
Notifications
You must be signed in to change notification settings - Fork 55
Deprecate the DeviceNDArray class and public APIs that return instances
#546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Deprecate the DeviceNDArray class and public APIs that return instances
#546
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| @functools.wraps(func) | ||
| def wrapper(*args, **kwargs): | ||
| warnings.warn( | ||
| f"{func.__name__} api is deprecated. Please prefer cupy for array functions", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cupy arrays are much slower than DeviceNDArray because they require creating an external (i.e., non-numba-cuda-created) stream, so I'm not sure a recommendation for that is what we should do right now.
I was thinking that we can keep the top-level APIs (device_array etc.) and replace their internals with StridedMemoryView or something similar, in an effort to allow folks to as-cheaply-as-possible construct arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur that a light weight device array like container should exist, I'm just not sure that numba-cuda should necessarily be the library providing it publicly. I think we should nudge users away from using numba-cuda as such, like for moving data from host to device. That said, I'm open to suggestions on what we should recommend.
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
numba_cuda/numba/cuda/kernels/reduction.py, line 262-264 (link)logic: Unnecessary conversion here -
resis already a device array with slicing support. The originalres[:1].copy_to_device(partials[:1], stream=stream)was simpler and more efficient.
74 files reviewed, 1 comment
cpcloud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
|
||
|
|
||
| class TestPinned(CUDATestCase): | ||
| # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking: is there a specific todo here?
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR deprecates DeviceNDArray and related public APIs in favor of CuPy. The implementation introduces:
- New deprecation infrastructure:
DeprecatedDeviceArrayApiWarningwarning class and@deprecated_array_apidecorator - API separation: Public deprecated APIs in
api.pydelegate to internal non-warning versions in new_api.pymodule - Internal factory method:
DeviceNDArray._create_nowarn()for internal use without warnings - Test infrastructure:
DeprecatedDeviceArrayApiTestbase class for tests using deprecated APIs - Widespread updates: 70+ test files updated to either use CuPy, suppress warnings, or call internal APIs
Critical Issues:
vectorizers.pyreferences non-existent_api._is_cuda_ndarrayfunction (will causeAttributeError)reduction.pyunnecessarily converts device arrays through__cuda_array_interface__when direct slicing would work
The deprecation strategy is sound, but the implementation has a blocking bug that will break runtime functionality.
Confidence Score: 0/5
- Critical runtime error will occur in vectorizers.py due to missing function
- The code references
cuda._api._is_cuda_ndarraywhich doesn't exist in the_apimodule, causing anAttributeErrorat runtime whenever vectorizers.py is used. This is a blocking bug that will break existing functionality. - numba_cuda/numba/cuda/vectorizers.py - contains reference to non-existent function; numba_cuda/numba/cuda/kernels/reduction.py - contains unnecessarily complex code path
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| numba_cuda/numba/cuda/kernels/reduction.py | 2/5 | Introduces unnecessary complexity by converting device arrays through cuda_array_interface when direct slicing would work. This creates an inefficient code path. |
| numba_cuda/numba/cuda/vectorizers.py | 0/5 | References non-existent function cuda._api._is_cuda_ndarray which will cause AttributeError at runtime. Should use devicearray.is_cuda_ndarray instead. |
| numba_cuda/numba/cuda/cudadrv/devicearray.py | 5/5 | Adds deprecation warnings to DeviceNDArray class and related APIs, introduces _create_nowarn factory method for internal use. Well-structured deprecation implementation. |
| numba_cuda/numba/cuda/api.py | 5/5 | Properly wraps public APIs with deprecation warnings and delegates to internal _api module implementations. Clean separation of concerns. |
| numba_cuda/numba/cuda/_api.py | 5/5 | New internal API module with non-warning versions of deprecated functions. Provides _is_cuda_array but missing _is_cuda_ndarray needed by vectorizers.py. |
| numba_cuda/numba/cuda/testing.py | 5/5 | Adds DeprecatedDeviceArrayApiTest base class for tests that need to suppress deprecation warnings. Clean test infrastructure update. |
| numba_cuda/numba/cuda/random.py | 5/5 | Updates to use internal _api._device_array instead of public cuda.device_array. Correct usage of internal non-warning API. |
| numba_cuda/numba/cuda/cudadrv/driver.py | 5/5 | Updates IpcHandle.open_array to use _create_nowarn factory method. Correct internal usage avoiding deprecation warnings. |
numba_cuda/numba/cuda/vectorizers.py
Outdated
| # synchronization on the array's stream (if it has one) by default. | ||
| # When we have a Numba device array, we can simply return it. | ||
| if cuda.cudadrv.devicearray.is_cuda_ndarray(obj): | ||
| if cuda._api._is_cuda_ndarray(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda._api._is_cuda_ndarray does not exist. The _api module only provides _is_cuda_array (checks for __cuda_array_interface__), not _is_cuda_ndarray (checks for __cuda_ndarray__). This will raise AttributeError at runtime. Use cuda.cudadrv.devicearray.is_cuda_ndarray(obj) instead.
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res is already a device array that supports slicing. Converting through __cuda_array_interface__ is unnecessary and adds complexity. Simply use res[:1].copy_to_device(partials[:1], stream=stream) as before.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Overview
This PR successfully deprecates the DeviceNDArray class and associated public APIs in favor of CuPy, addressing issue #471. The implementation follows a clean separation pattern:
- Public deprecated APIs in
api.pyemitDeprecatedDeviceArrayApiWarningand delegate to internal implementations - Internal non-deprecated implementations in
_api.pyallow the codebase to continue using these functions without triggering warnings - Factory method
DeviceNDArray._create_nowarn()enables internal code to construct instances without warnings - Test infrastructure via
DeprecatedDeviceArrayApiTestbase class suppresses warnings for tests of deprecated APIs
Key Changes
- Deprecated APIs:
to_device,device_array,managed_array,pinned_array,mapped_array,as_cuda_array,is_cuda_array,from_cuda_array_interface, andDeviceNDArrayconstructor - Updated 50+ test files to use
DeprecatedDeviceArrayApiTestor call internal_apifunctions - Added deprecation notices to documentation recommending CuPy
- Systematically replaced direct
DeviceNDArray()constructor calls with_create_nowarn()throughout internal code
Issues Found
- Missing decorator (logic error):
_device_arrayinapi.pylacks the@require_contextdecorator present in_api.py, which could cause runtime errors - Unnecessary complexity (style):
reduction.py:262-264converts a device array through CAI unnecessarily when direct slicing would suffice
The deprecation strategy is sound and well-executed overall, with proper separation of concerns between public/internal APIs and comprehensive test coverage updates.
Confidence Score: 4/5
- This PR is mostly safe to merge with one logic issue that needs fixing
- Score of 4 reflects solid deprecation implementation with proper API separation and test coverage, but requires fixing the missing @require_context decorator which could cause runtime failures. The unnecessary complexity in reduction.py is a style issue that doesn't affect correctness.
- Pay special attention to numba/cuda/api.py (missing decorator on line 148) and numba/cuda/kernels/reduction.py (unnecessary complexity on lines 262-264)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| numba/cuda/api.py | 3/5 | Added deprecation warnings to public APIs; _device_array missing @require_context decorator |
| numba/cuda/_api.py | 5/5 | New internal non-deprecated API implementations - correctly implemented |
| numba/cuda/cudadrv/devicearray.py | 5/5 | Added deprecation warning to DeviceNDArray constructor and _create_nowarn factory method |
| numba/cuda/kernels/reduction.py | 3/5 | Lines 262-264 use unnecessary CAI conversion; res is already sliceable device array |
| numba/cuda/testing.py | 5/5 | Added DeprecatedDeviceArrayApiTest base class for suppressing warnings in tests |
Additional Comments (2)
Since Suggested fix:
The current implementation:
This is equivalent to just doing Suggested fix: This aligns with the previous reviewer's observation about unnecessary complexity. |
|
/ok to test |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | ||
| shape, strides, dtype = prepare_shape_strides_dtype( | ||
| shape, strides, dtype, order | ||
| ) | ||
| return devicearray.DeviceNDArray( | ||
| return devicearray.DeviceNDArray._create_nowarn( | ||
| shape=shape, strides=strides, dtype=dtype, stream=stream | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] The internal _device_array function is missing the @require_context decorator. This is inconsistent with the same function in _api.py (line 143) which has the decorator.
Since DeviceNDArray._create_nowarn requires an active CUDA context (it calls devices.get_context().memalloc() when allocating memory), calling this function without an active context will cause a runtime error. This function is used internally by the public device_array function which has the decorator via the outer function, but _device_array itself can be called directly from other modules (e.g., from cuda.kernels.reduction, cuda.vectorizers, cuda.random, etc.).
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| shape, strides, dtype = prepare_shape_strides_dtype( | |
| shape, strides, dtype, order | |
| ) | |
| return devicearray.DeviceNDArray( | |
| return devicearray.DeviceNDArray._create_nowarn( | |
| shape=shape, strides=strides, dtype=dtype, stream=stream | |
| ) | |
| @require_context | |
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| shape, strides, dtype = prepare_shape_strides_dtype( | |
| shape, strides, dtype, order | |
| ) | |
| return devicearray.DeviceNDArray._create_nowarn( | |
| shape=shape, strides=strides, dtype=dtype, stream=stream | |
| ) |
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Unnecessary complexity: converting res through __cuda_array_interface__ when it's already sliceable.
The res parameter is a device array that already supports slicing operations. The old code simply used res[:1] which worked correctly. Converting it through _from_cuda_array_interface(res.__cuda_array_interface__) adds unnecessary overhead and complexity without any functional benefit.
Since res implements the CUDA array interface, it can be sliced directly. The [:1] operation will work on any object that implements __getitem__ properly, including DeviceNDArray and other CUDA-compatible arrays.
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | |
| :1 | |
| ].copy_to_device(partials[:1], stream=stream) | |
| res[:1].copy_to_device(partials[:1], stream=stream) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| ) | ||
|
|
||
|
|
||
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P0] Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which internally calls devices.get_context().memalloc() (line 123 in devicearray.py), requiring an active CUDA context. Without this decorator, the function will fail with a context error when called without an active context. The equivalent function in _api.py (line 143) correctly has this decorator.
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[P2] Unnecessary complexity. res is already a sliceable device array (CuPy array as shown in tests), so res[:1] works directly. Converting through __cuda_array_interface__ and _from_cuda_array_interface() adds overhead without benefit. The original code res[:1].copy_to_device(partials[:1], stream=stream) was simpler and equivalent.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
75 files reviewed, 1 comment
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | ||
| shape, strides, dtype = prepare_shape_strides_dtype( | ||
| shape, strides, dtype, order | ||
| ) | ||
| return devicearray.DeviceNDArray( | ||
| return devicearray.DeviceNDArray._create_nowarn( | ||
| shape=shape, strides=strides, dtype=dtype, stream=stream | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Missing @require_context decorator. The function calls DeviceNDArray._create_nowarn() which requires an active CUDA context (allocates memory via devices.get_context().memalloc()). The equivalent function in _api.py (line 143) has this decorator.
|
/ok to test |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
79 files reviewed, 2 comments
| ) | ||
|
|
||
|
|
||
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc() (devicearray.py:123), requiring an active CUDA context. The equivalent function in _api.py (line 143) correctly has this decorator.
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): | |
| @require_context | |
| def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0): |
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | ||
| :1 | ||
| ].copy_to_device(partials[:1], stream=stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Unnecessary complexity. res is already a sliceable device array (supports __cuda_array_interface__), so converting through _from_cuda_array_interface() adds overhead without benefit. Direct slicing works:
| cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[ | |
| :1 | |
| ].copy_to_device(partials[:1], stream=stream) | |
| res[:1].copy_to_device(partials[:1], stream=stream) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Part of #471
DeprecatedNDArrayAPIWarningemitted from all user facing functions for moving data around (cuda.to_device, driver.host_to_device, device_to_host, also as_cuda_array, is_cuda_array, etcDeviceNDArrayctorDeviceNDArray._create_nowarn