Skip to content

Conversation

@brandon-b-miller
Copy link
Contributor

Follow up #523

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 31, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@brandon-b-miller brandon-b-miller changed the title Vendor numpy median, percentile, and quantile reductions Vendor numpy median, percentile, and quantile reduction tests Oct 31, 2025
@gmarkall gmarkall added the 2 - In Progress Currently a work in progress label Nov 18, 2025
@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 21, 2026
@brandon-b-miller brandon-b-miller marked this pull request as ready for review January 21, 2026 17:03
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 21, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@brandon-b-miller
Copy link
Contributor Author

/ok to test

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 21, 2026

Greptile Summary

Vendors comprehensive tests for np.median, np.percentile, np.quantile, np.nanpercentile, and np.nanquantile from upstream NumPy test suite. Adds necessary infrastructure including make_constant_list method and ufunc implementations for np.isfinite and np.multiply.

Key changes:

  • Added NRTEnablingCUDATestCase base class to centralize NRT config management across test files
  • Implemented make_constant_list to create constant lists in CUDA constant address space
  • Added 260+ lines of vendored reduction tests covering edge cases, empty arrays, finite/infinite values
  • Refactored test_array_methods.py to use new base class
  • Removed memcpy fast path optimization in _array_copy function (performance regression)
  • Added new test file test_array_methods.py with array copy verification

Issues found:

  • Performance regression in arrayobj.py:5506-5531 where memcpy fast path was removed
  • Redundant NRT config management in test_array_reductions.py:25-36 conflicts with parent class

Confidence Score: 3/5

  • This PR adds valuable test coverage but introduces a performance regression and config management issues
  • Score reflects significant performance regression from removed memcpy optimization and potential config conflicts from redundant NRT state management. Test additions are comprehensive but the infrastructure changes need fixes.
  • Pay close attention to numba_cuda/numba/cuda/np/arrayobj.py (memcpy fast path removed) and numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py (redundant config management)

Important Files Changed

Filename Overview
numba_cuda/numba/cuda/np/arrayobj.py Removed memcpy fast path in _array_copy, causing performance regression; added constant_list lowering
numba_cuda/numba/cuda/tests/cudapy/test_array_methods.py Refactored to use NRTEnablingCUDATestCase, but redundant config management remains in setUp/tearDown
numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py Added comprehensive tests for median, percentile, and quantile; redundant NRT config management

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py, line 25-36 (link)

    logic: redundant config management - NRTEnablingCUDATestCase already manages self.old_nrt_setting and config.CUDA_ENABLE_NRT in its own setUp/tearDown

    this creates duplicate state and could cause issues. remove the NRT-related lines:

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 5506 to +5531
assert rettype.layout in "CF"
if arytype.layout == rettype.layout:
# Fast path: memcpy
cgutils.raw_memcpy(
builder, dest_data, src_data, ary.nitems, ary.itemsize, align=1
)

else:
src_strides = cgutils.unpack_tuple(builder, ary.strides)
dest_strides = cgutils.unpack_tuple(builder, ret.strides)
intp_t = context.get_value_type(types.intp)
src_strides = cgutils.unpack_tuple(builder, ary.strides)
dest_strides = cgutils.unpack_tuple(builder, ret.strides)
intp_t = context.get_value_type(types.intp)

with cgutils.loop_nest(builder, shapes, intp_t) as indices:
src_ptr = cgutils.get_item_pointer2(
context,
builder,
src_data,
shapes,
src_strides,
arytype.layout,
indices,
)
dest_ptr = cgutils.get_item_pointer2(
context,
builder,
dest_data,
shapes,
dest_strides,
rettype.layout,
indices,
)
builder.store(builder.load(src_ptr), dest_ptr)
with cgutils.loop_nest(builder, shapes, intp_t) as indices:
src_ptr = cgutils.get_item_pointer2(
context,
builder,
src_data,
shapes,
src_strides,
arytype.layout,
indices,
)
dest_ptr = cgutils.get_item_pointer2(
context,
builder,
dest_data,
shapes,
dest_strides,
rettype.layout,
indices,
)
builder.store(builder.load(src_ptr), dest_ptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: removed memcpy fast path optimization when source and destination layouts match - this causes performance regression for same-layout copies

the original code had:

if arytype.layout == rettype.layout:
    # Fast path: memcpy
    cgutils.raw_memcpy(builder, dest_data, src_data, ary.nitems, ary.itemsize, align=1)

consider restoring the fast path for performance

@brandon-b-miller brandon-b-miller added 2 - In Progress Currently a work in progress and removed 3 - Ready for Review Ready for review by team labels Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 - In Progress Currently a work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants