Skip to content

Decouple from cudf::detail::make_counting_transform_iterator#4306

Open
mythrocks wants to merge 6 commits intoNVIDIA:mainfrom
mythrocks:counting-transform-iterator
Open

Decouple from cudf::detail::make_counting_transform_iterator#4306
mythrocks wants to merge 6 commits intoNVIDIA:mainfrom
mythrocks:counting-transform-iterator

Conversation

@mythrocks
Copy link
Collaborator

@mythrocks mythrocks commented Feb 24, 2026

This commit introduces utility iterators to be used in place cudf::detail iterators. This is to further reduce dependencies on cudf::detail APIs that are now deemed private to the CUDF project.

make_counting_transform_iterator

This change introduces a version of make_counting_transform_iterator that is specific to Spark RAPIDS JNI.

The previous version of this function is from cudf::detail, which is now deemed private to cuDF. This commit should allow Spark RAPIDS JNI to be insulated from changes to interfaces in cudf::detail.

Note that this version does not use thrust::transform_iterator. It banks instead on cuda::make_transform_iterator instead.

make_pair_iterator

This commit also introduces make_pair_iterator(column_device_view const&) and make_pair_iterator(scalar const&). Much like their counterparts in cudf::detail, these functions produce pair-iterators that allow iteration over a column's rows, along with a bool indicating whether the row is valid (i.e. non-null).

This change introduces a version of `make_counting_transform_iterator`
that is specific to Spark RAPIDS JNI.

The previous version of this function is from `cudf::detail`, which is
now deemed private to cuDF. This commit should allow Spark RAPIDS JNI
to be insulated from changes to interfaces in `cudf::detail`.

Note that this version does not use `thrust::transform_iterator`.
It banks instead on `cuda::make_transform_iterator` instead.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks self-assigned this Feb 24, 2026
@mythrocks mythrocks changed the title Decouple from cudf::detail::make_counting_transform_iterator [WIP] Decouple from cudf::detail::make_counting_transform_iterator Feb 24, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Greptile Summary

This PR decouples spark-rapids-jni from cudf::detail iterator APIs by introducing local utility functions in src/main/cpp/src/utilities/iterator.cuh. The new utilities mirror their cudf::detail counterparts but use cuda::counting_iterator instead of thrust::counting_iterator for make_counting_transform_iterator.

Key changes:

  • Added new spark_rapids_jni::util::make_counting_transform_iterator and spark_rapids_jni::util::make_pair_iterator functions
  • Updated all source files to use the new local iterators instead of cudf::detail versions
  • In row_conversion.cu, qualified unqualified util:: calls with cudf::util:: for namespace clarity after removing the cudf/detail/iterator.cuh include
  • Updated test files accordingly

The implementation correctly replicates the cuDF detail functionality while insulating the codebase from future cuDF internal API changes.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are mechanical replacements of iterator utilities with functionally equivalent local implementations. The new iterator utilities are well-documented, correctly replicate the cuDF detail APIs, and all usage sites have been updated consistently. The namespace qualification changes in row_conversion.cu improve code clarity.
  • No files require special attention

Important Files Changed

Filename Overview
src/main/cpp/src/utilities/iterator.cuh New file introducing local iterator utilities to replace cudf::detail APIs, includes make_counting_transform_iterator and make_pair_iterator functions
src/main/cpp/src/hyper_log_log_plus_plus.cu Replaced 7 usages of cudf::detail::make_counting_transform_iterator with local utility version, added necessary includes
src/main/cpp/src/multiply.cu Replaced all cudf::detail::make_pair_iterator calls with spark_rapids_jni::util::make_pair_iterator for both column and scalar versions
src/main/cpp/src/row_conversion.cu Replaced iterator includes and qualified all util:: calls with cudf::util:: namespace for clarity after removing cudf/detail/iterator.cuh
src/main/cpp/src/shuffle_split.cu Replaced 7 usages of cudf::detail::make_counting_transform_iterator with local utility version

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[cudf::detail::make_counting_transform_iterator] -->|replaced by| B[spark_rapids_jni::util::make_counting_transform_iterator]
    C[cudf::detail::make_pair_iterator] -->|replaced by| D[spark_rapids_jni::util::make_pair_iterator]
    E[thrust::counting_iterator] -->|changed to| F[cuda::counting_iterator]
    
    B --> G[from_json_to_structs.cu]
    B --> H[shuffle_split.cu]
    B --> I[hyper_log_log_plus_plus.cu]
    B --> J[Other source files]
    
    D --> K[multiply.cu]
    
    L[cudf/detail/iterator.cuh] -->|removed| M[utilities/iterator.cuh]
    M -->|new include| N[All modified files]
    
    style B fill:#90EE90
    style D fill:#90EE90
    style M fill:#90EE90
    style F fill:#FFD700
Loading

Last reviewed commit: e10722a

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks mythrocks changed the title [WIP] Decouple from cudf::detail::make_counting_transform_iterator Decouple from cudf::detail::make_counting_transform_iterator Feb 24, 2026
@mythrocks
Copy link
Collaborator Author

Build

Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: MithunR <mithunr@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: MithunR <mithunr@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks
Copy link
Collaborator Author

Build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks
Copy link
Collaborator Author

Build

@sameerz sameerz requested a review from a team February 28, 2026 00:15
Copy link
Collaborator

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hold off a little bit. We need to discuss on mitigating the issue with code duplicates and unavoidable dependency from cudf detail namespace.

@mythrocks
Copy link
Collaborator Author

I think make_counting_transform_iterator is alright to have our own version of.

I think the pair-wise iterator should probably remain in CUDF. I'll check whether libcudf is accepting of exposing those utilities.

: dscalar(cudf::get_scalar_device_view(
static_cast<ScalarType&>(const_cast<cudf::scalar&>(scalar_value))))
{
CUDF_EXPECTS(type_id_matches_device_storage_type<Element>(scalar_value.type().id()),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use CUDF_EXPECTS here. Throws cudf-specific errors, instead of spark-rapids-jni.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants