Deprecate the `DeviceNDArray` class and public APIs that return instances #546

brandon-b-miller · 2025-10-24T13:30:34Z

Part of #471

Adds a DeprecatedNDArrayAPIWarning emitted from all user facing functions for moving data around (cuda.to_device, driver.host_to_device, device_to_host, also as_cuda_array, is_cuda_array, etc
Separates existing now deprecated APIs into internal non-warning versions and external warning versions
Adds a deprecation warning to the DeviceNDArray ctor
Adds DeviceNDArray._create_nowarn
Removes as many usages of the deprecated APIs as possible from the test suite in favor of cupy arrays
Catches warnings for tests of the currently exposed and now deprecated APIs
Where absolutely necessary, tests calls internal non-warning versions of the deprecated APIs
Rework tests to not use these apis as much as possible

copy-pr-bot · 2025-10-24T13:30:37Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpcloud · 2025-10-24T15:04:39Z

numba_cuda/numba/cuda/cudadrv/devicearray.py

+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        warnings.warn(
+            f"{func.__name__} api is deprecated. Please prefer cupy for array functions",


cupy arrays are much slower than DeviceNDArray because they require creating an external (i.e., non-numba-cuda-created) stream, so I'm not sure a recommendation for that is what we should do right now.

I was thinking that we can keep the top-level APIs (device_array etc.) and replace their internals with StridedMemoryView or something similar, in an effort to allow folks to as-cheaply-as-possible construct arrays.

Here's the current state of the art:

I concur that a light weight device array like container should exist, I'm just not sure that numba-cuda should necessarily be the library providing it publicly. I think we should nudge users away from using numba-cuda as such, like for moving data from host to device. That said, I'm open to suggestions on what we should recommend.

brandon-b-miller · 2026-01-05T22:42:29Z

/ok to test

greptile-apps

Additional Comments (1)

numba_cuda/numba/cuda/kernels/reduction.py, line 262-264 (link)

logic: Unnecessary conversion here - res is already a device array with slicing support. The original res[:1].copy_to_device(partials[:1], stream=stream) was simpler and more efficient.

_{74 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

cpcloud

LGTM.

cpcloud · 2026-01-08T14:24:28Z

numba_cuda/numba/cuda/tests/cudadrv/test_pinned.py



-class TestPinned(CUDATestCase):
+# TODO


non-blocking: is there a specific todo here?

brandon-b-miller · 2026-01-09T13:32:00Z

/ok to test

greptile-apps

Greptile Overview

Greptile Summary

This PR deprecates DeviceNDArray and related public APIs in favor of CuPy. The implementation introduces:

New deprecation infrastructure: DeprecatedDeviceArrayApiWarning warning class and @deprecated_array_api decorator
API separation: Public deprecated APIs in api.py delegate to internal non-warning versions in new _api.py module
Internal factory method: DeviceNDArray._create_nowarn() for internal use without warnings
Test infrastructure: DeprecatedDeviceArrayApiTest base class for tests using deprecated APIs
Widespread updates: 70+ test files updated to either use CuPy, suppress warnings, or call internal APIs

Critical Issues:

vectorizers.py references non-existent _api._is_cuda_ndarray function (will cause AttributeError)
reduction.py unnecessarily converts device arrays through __cuda_array_interface__ when direct slicing would work

The deprecation strategy is sound, but the implementation has a blocking bug that will break runtime functionality.

Confidence Score: 0/5

Critical runtime error will occur in vectorizers.py due to missing function
The code references cuda._api._is_cuda_ndarray which doesn't exist in the _api module, causing an AttributeError at runtime whenever vectorizers.py is used. This is a blocking bug that will break existing functionality.
numba_cuda/numba/cuda/vectorizers.py - contains reference to non-existent function; numba_cuda/numba/cuda/kernels/reduction.py - contains unnecessarily complex code path

Important Files Changed

File Analysis

Filename	Score	Overview
numba_cuda/numba/cuda/kernels/reduction.py	2/5	Introduces unnecessary complexity by converting device arrays through cuda_array_interface when direct slicing would work. This creates an inefficient code path.
numba_cuda/numba/cuda/vectorizers.py	0/5	References non-existent function cuda._api._is_cuda_ndarray which will cause AttributeError at runtime. Should use devicearray.is_cuda_ndarray instead.
numba_cuda/numba/cuda/cudadrv/devicearray.py	5/5	Adds deprecation warnings to DeviceNDArray class and related APIs, introduces _create_nowarn factory method for internal use. Well-structured deprecation implementation.
numba_cuda/numba/cuda/api.py	5/5	Properly wraps public APIs with deprecation warnings and delegates to internal _api module implementations. Clean separation of concerns.
numba_cuda/numba/cuda/_api.py	5/5	New internal API module with non-warning versions of deprecated functions. Provides _is_cuda_array but missing _is_cuda_ndarray needed by vectorizers.py.
numba_cuda/numba/cuda/testing.py	5/5	Adds DeprecatedDeviceArrayApiTest base class for tests that need to suppress deprecation warnings. Clean test infrastructure update.
numba_cuda/numba/cuda/random.py	5/5	Updates to use internal _api._device_array instead of public cuda.device_array. Correct usage of internal non-warning API.
numba_cuda/numba/cuda/cudadrv/driver.py	5/5	Updates IpcHandle.open_array to use _create_nowarn factory method. Correct internal usage avoiding deprecation warnings.

greptile-apps · 2026-01-09T13:34:53Z

numba_cuda/numba/cuda/vectorizers.py

        # synchronization on the array's stream (if it has one) by default.
        # When we have a Numba device array, we can simply return it.
-        if cuda.cudadrv.devicearray.is_cuda_ndarray(obj):
+        if cuda._api._is_cuda_ndarray(obj):


cuda._api._is_cuda_ndarray does not exist. The _api module only provides _is_cuda_array (checks for __cuda_array_interface__), not _is_cuda_ndarray (checks for __cuda_ndarray__). This will raise AttributeError at runtime. Use cuda.cudadrv.devicearray.is_cuda_ndarray(obj) instead.

greptile-apps · 2026-01-09T13:34:54Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


res is already a device array that supports slicing. Converting through __cuda_array_interface__ is unnecessary and adds complexity. Simply use res[:1].copy_to_device(partials[:1], stream=stream) as before.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

brandon-b-miller · 2026-01-09T15:08:14Z

/ok to test

greptile-apps

Greptile Overview

Greptile Summary

Overview

This PR successfully deprecates the DeviceNDArray class and associated public APIs in favor of CuPy, addressing issue #471. The implementation follows a clean separation pattern:

Public deprecated APIs in api.py emit DeprecatedDeviceArrayApiWarning and delegate to internal implementations
Internal non-deprecated implementations in _api.py allow the codebase to continue using these functions without triggering warnings
Factory method DeviceNDArray._create_nowarn() enables internal code to construct instances without warnings
Test infrastructure via DeprecatedDeviceArrayApiTest base class suppresses warnings for tests of deprecated APIs

Key Changes

Deprecated APIs: to_device, device_array, managed_array, pinned_array, mapped_array, as_cuda_array, is_cuda_array, from_cuda_array_interface, and DeviceNDArray constructor
Updated 50+ test files to use DeprecatedDeviceArrayApiTest or call internal _api functions
Added deprecation notices to documentation recommending CuPy
Systematically replaced direct DeviceNDArray() constructor calls with _create_nowarn() throughout internal code

Issues Found

Missing decorator (logic error): _device_array in api.py lacks the @require_context decorator present in _api.py, which could cause runtime errors
Unnecessary complexity (style): reduction.py:262-264 converts a device array through CAI unnecessarily when direct slicing would suffice

The deprecation strategy is sound and well-executed overall, with proper separation of concerns between public/internal APIs and comprehensive test coverage updates.

Confidence Score: 4/5

This PR is mostly safe to merge with one logic issue that needs fixing
Score of 4 reflects solid deprecation implementation with proper API separation and test coverage, but requires fixing the missing @require_context decorator which could cause runtime failures. The unnecessary complexity in reduction.py is a style issue that doesn't affect correctness.
Pay special attention to numba/cuda/api.py (missing decorator on line 148) and numba/cuda/kernels/reduction.py (unnecessary complexity on lines 262-264)

Important Files Changed

File Analysis

Filename	Score	Overview
numba/cuda/api.py	3/5	Added deprecation warnings to public APIs; _device_array missing @require_context decorator
numba/cuda/_api.py	5/5	New internal non-deprecated API implementations - correctly implemented
numba/cuda/cudadrv/devicearray.py	5/5	Added deprecation warning to DeviceNDArray constructor and _create_nowarn factory method
numba/cuda/kernels/reduction.py	3/5	Lines 262-264 use unnecessary CAI conversion; res is already sliceable device array
numba/cuda/testing.py	5/5	Added DeprecatedDeviceArrayApiTest base class for suppressing warnings in tests

greptile-apps · 2026-01-09T15:14:57Z

Additional Comments (2)

numba/cuda/api.py
The _device_array function is missing the @require_context decorator. This is inconsistent with the same function in _api.py (line 143) which has the decorator, and could cause runtime errors if called without an active CUDA context.

Since DeviceNDArray._create_nowarn requires an active context (it calls devices.get_context().memalloc()), this function should have the decorator to ensure a context exists before attempting allocation.

Suggested fix:

@require_context
def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
    shape, strides, dtype = prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
    return devicearray.DeviceNDArray._create_nowarn(
        shape=shape, strides=strides, dtype=dtype, stream=stream
    )

numba/cuda/kernels/reduction.py
This code unnecessarily converts res through __cuda_array_interface__ to slice it. Since res is documented as an "Optional device array" (line 201), it should already be a device array object that supports slicing directly.

The current implementation:

Takes a device array res
Extracts its __cuda_array_interface__
Converts it back to a DeviceNDArray via _from_cuda_array_interface
Slices the result [:1]

This is equivalent to just doing res[:1], which would be simpler and more efficient.

Suggested fix:

            res[:1].copy_to_device(partials[:1], stream=stream)

This aligns with the previous reviewer's observation about unnecessary complexity.

brandon-b-miller · 2026-01-12T15:34:15Z

/ok to test

brandon-b-miller · 2026-01-12T15:35:44Z

/ok to test

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T15:37:14Z

numba_cuda/numba/cuda/api.py

+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
    shape, strides, dtype = prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
-    return devicearray.DeviceNDArray(
+    return devicearray.DeviceNDArray._create_nowarn(
        shape=shape, strides=strides, dtype=dtype, stream=stream
    )


[P0] The internal _device_array function is missing the @require_context decorator. This is inconsistent with the same function in _api.py (line 143) which has the decorator.

Since DeviceNDArray._create_nowarn requires an active CUDA context (it calls devices.get_context().memalloc() when allocating memory), calling this function without an active context will cause a runtime error. This function is used internally by the public device_array function which has the decorator via the outer function, but _device_array itself can be called directly from other modules (e.g., from cuda.kernels.reduction, cuda.vectorizers, cuda.random, etc.).

Suggested change

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

shape, strides, dtype = prepare_shape_strides_dtype(

shape, strides, dtype, order

)

return devicearray.DeviceNDArray(

return devicearray.DeviceNDArray._create_nowarn(

shape=shape, strides=strides, dtype=dtype, stream=stream

)

@require_context

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

shape, strides, dtype = prepare_shape_strides_dtype(

shape, strides, dtype, order

)

return devicearray.DeviceNDArray._create_nowarn(

shape=shape, strides=strides, dtype=dtype, stream=stream

)

greptile-apps · 2026-01-12T15:37:15Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


[P2] Unnecessary complexity: converting res through __cuda_array_interface__ when it's already sliceable.

The res parameter is a device array that already supports slicing operations. The old code simply used res[:1] which worked correctly. Converting it through _from_cuda_array_interface(res.__cuda_array_interface__) adds unnecessary overhead and complexity without any functional benefit.

Since res implements the CUDA array interface, it can be sliced directly. The [:1] operation will work on any object that implements __getitem__ properly, including DeviceNDArray and other CUDA-compatible arrays.

Suggested change

cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[

:1

].copy_to_device(partials[:1], stream=stream)

res[:1].copy_to_device(partials[:1], stream=stream)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-12T15:38:14Z

numba_cuda/numba/cuda/api.py

+    )
+
+
+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):


[P0] Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which internally calls devices.get_context().memalloc() (line 123 in devicearray.py), requiring an active CUDA context. Without this decorator, the function will fail with a context error when called without an active context. The equivalent function in _api.py (line 143) correctly has this decorator.

greptile-apps · 2026-01-12T15:38:15Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


[P2] Unnecessary complexity. res is already a sliceable device array (CuPy array as shown in tests), so res[:1] works directly. Converting through __cuda_array_interface__ and _from_cuda_array_interface() adds overhead without benefit. The original code res[:1].copy_to_device(partials[:1], stream=stream) was simpler and equivalent.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

brandon-b-miller · 2026-01-20T18:58:13Z

/ok to test

greptile-apps

_{75 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-20T19:03:13Z

numba_cuda/numba/cuda/api.py

+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
    shape, strides, dtype = prepare_shape_strides_dtype(
        shape, strides, dtype, order
    )
-    return devicearray.DeviceNDArray(
+    return devicearray.DeviceNDArray._create_nowarn(
        shape=shape, strides=strides, dtype=dtype, stream=stream
    )


logic: Missing @require_context decorator. The function calls DeviceNDArray._create_nowarn() which requires an active CUDA context (allocates memory via devices.get_context().memalloc()). The equivalent function in _api.py (line 143) has this decorator.

brandon-b-miller · 2026-01-20T20:15:56Z

/ok to test

brandon-b-miller · 2026-01-20T21:01:32Z

/ok to test

greptile-apps

_{79 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-20T21:04:24Z

numba_cuda/numba/cuda/api.py

+    )
+
+
+def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):


syntax: Missing @require_context decorator. This function calls DeviceNDArray._create_nowarn() which allocates memory via devices.get_context().memalloc() (devicearray.py:123), requiring an active CUDA context. The equivalent function in _api.py (line 143) correctly has this decorator.

Suggested change

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

@require_context

def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

greptile-apps · 2026-01-20T21:04:25Z

numba_cuda/numba/cuda/kernels/reduction.py

+            cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[
+                :1
+            ].copy_to_device(partials[:1], stream=stream)


style: Unnecessary complexity. res is already a sliceable device array (supports __cuda_array_interface__), so converting through _from_cuda_array_interface() adds overhead without benefit. Direct slicing works:

Suggested change

cuda._api._from_cuda_array_interface(res.__cuda_array_interface__)[

:1

].copy_to_device(partials[:1], stream=stream)

res[:1].copy_to_device(partials[:1], stream=stream)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

initial

a58f928

cpcloud reviewed Oct 24, 2025

View reviewed changes

gmarkall added the 2 - In Progress Currently a work in progress label Oct 24, 2025

rparolin added this to the next milestone Oct 24, 2025

brandon-b-miller mentioned this pull request Oct 27, 2025

Handle cuda.core.Stream in driver operations #401

Merged

brandon-b-miller added 24 commits October 27, 2025 15:18

Merge branch 'main' into deprecate-host-array-api

db335c1

Merge branch 'main' into deprecate-host-array-api

601eec4

progress replacing tests

762a6b1

progress

2081572

Merge branch 'main' into deprecate-host-array-api

8377953

clean

edf413d

Merge branch 'main' into deprecate-host-array-api

bb92fec

more clean

6e08f80

working through more test cases

50683d0

working out class relationships

e44516b

Merge branch 'main' into deprecate-host-array-api

85d5149

partially switch designs

58d716c

merge/progress

7872521

partial

ec5c175

more progress

e06ce49

fix a few more tests

ef6860a

even more tests

238052b

Merge branch 'main' into deprecate-host-array-api

c2d1f25

fix blackscholes test

1c69776

fix test_gufunc_arg

ec67eb5

tests

40a89a8

fix remaining tests

bec33c4

merge/resolve

da9a7e5

fix new test failures

b8f3790

bifurcate cupy

6664865

greptile-apps bot reviewed Jan 5, 2026

View reviewed changes

cpcloud approved these changes Jan 8, 2026

View reviewed changes

brandon-b-miller added 2 commits January 9, 2026 05:29

merge/resolve

92ac62a

fixi

b27e0a2

greptile-apps bot reviewed Jan 9, 2026

View reviewed changes

brandon-b-miller added 2 commits January 9, 2026 07:01

fix tests

89720dd

fix ndarray check

c13e007

greptile-apps bot reviewed Jan 9, 2026

View reviewed changes

merge/resolve

09d0219

Merge branch 'main' into deprecate-host-array-api

c269001

greptile-apps bot reviewed Jan 12, 2026

View reviewed changes

brandon-b-miller added 3 commits January 12, 2026 07:51

fix simulator

2db2d21

update bfloat16 tests

ec053d1

merge/resolve

f07b744

greptile-apps bot reviewed Jan 20, 2026

View reviewed changes

brandon-b-miller added 2 commits January 20, 2026 11:53

test cu-CUDA_MAJOR in run-tests

0ae990d

small fixes

b4c03ad

more fixes

733f8b7

greptile-apps bot reviewed Jan 20, 2026

View reviewed changes

		)


		def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

	def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):
	@require_context
	def _device_array(shape, dtype=np.float64, strides=None, order="C", stream=0):

Deprecate the DeviceNDArray class and public APIs that return instances #546

Are you sure you want to change the base?

Deprecate the DeviceNDArray class and public APIs that return instances #546

Uh oh!

Conversation

brandon-b-miller commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 24, 2025

Uh oh!

cpcloud Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

cpcloud Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 5, 2026

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Additional Comments (1)

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

cpcloud Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 9, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 0/5

Important Files Changed

Uh oh!

greptile-apps bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

brandon-b-miller commented Jan 9, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Overview

Key Changes

Issues Found

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot commented Jan 9, 2026

Uh oh!

brandon-b-miller commented Jan 12, 2026

Uh oh!

brandon-b-miller commented Jan 12, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Deprecate the `DeviceNDArray` class and public APIs that return instances #546

Deprecate the `DeviceNDArray` class and public APIs that return instances #546

brandon-b-miller commented Oct 24, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading