Skip to content

Replace cudf::detail::valid_if with cudf::bools_to_mask#4301

Open
mythrocks wants to merge 6 commits intoNVIDIA:mainfrom
mythrocks:valid_if-to-bools_to_mask
Open

Replace cudf::detail::valid_if with cudf::bools_to_mask#4301
mythrocks wants to merge 6 commits intoNVIDIA:mainfrom
mythrocks:valid_if-to-bools_to_mask

Conversation

@mythrocks
Copy link
Collaborator

This commit is part of the continuing effort to reduce the dependency of spark-rapids-jni on cudf::detail APIs. In this commit, some of the references to cudf::detail::valid_if with cudf::bools_to_mask.

The functionality should not be altered. Existing tests ought to cover the changes.

This commit is part of the continuihng effort to reduce the dependency of
spark-rapids-jni on `cudf::detail` APIs.  In this commit, some of the
references to `cudf::detail::valid_if` with `cudf::bools_to_mask`.

The functionality should not be altered.  Existing tests ought to cover
the changes.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks self-assigned this Feb 20, 2026
@mythrocks mythrocks marked this pull request as draft February 20, 2026 23:48
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 20, 2026

Greptile Summary

This PR migrates four files away from the cudf::detail::valid_if internal API to the public cudf::bools_to_mask API, continuing the effort to reduce reliance on cuDF detail headers.

  • generate_input.cu: Five valid_if call sites replaced. Where valid_if could accept arbitrary iterator+predicate pairs, bools_to_mask requires a device_span<bool const>, so two call sites now materialize booleans into intermediate rmm::device_uvector<bool> buffers first (create_random_null_mask uses thrust::tabulate; the struct column path was already producing bools). Return type adapts from rmm::device_buffer to std::unique_ptr<rmm::device_buffer> using std::move(*ptr.release()).
  • from_json_to_raw_map.cu: The valid_if call that applied thrust::logical_not inline is replaced by a thrust::transform into a temporary device_uvector<bool> followed by bools_to_mask.
  • get_json_object.cu: Similar pattern—thrust::transform materializes validity booleans into a temporary uvector inside an IIFE, then bools_to_mask is called. Stream ordering ensures safety.
  • histogram.cu: Removes an unnecessary static_cast<int8_t> when writing a bool expression to a bool array; the file already used bools_to_mask.
  • One remaining valid_if usage exists in from_json_to_structs.cu, not addressed in this PR.

Confidence Score: 4/5

  • This PR is a straightforward API migration with no functional changes; safe to merge after verifying tests pass.
  • All replacements follow the same mechanical pattern (valid_if → bools_to_mask), the return type differences are correctly handled throughout, iterator ranges are preserved, and the changes are in benchmark/utility code with existing test coverage. The one minor formatting inconsistency (alignment in generate_input.cu) is non-functional.
  • src/main/cpp/benchmarks/common/generate_input.cu has the most changes (5 call sites) and deserves the closest review.

Important Files Changed

Filename Overview
src/main/cpp/benchmarks/common/generate_input.cu Replaces 5 cudf::detail::valid_if call sites with cudf::bools_to_mask. Adapts return types from rmm::device_buffer to std::unique_ptr<rmm::device_buffer>, using std::move(*ptr.release()) consistently. In create_random_null_mask, materializes booleans into a uvector before calling bools_to_mask.
src/main/cpp/src/from_json_to_raw_map.cu Replaces valid_if with an intermediate thrust::transform + bools_to_mask pattern. The logical negation that was previously inline in valid_if's predicate is now applied via thrust::transform into a temporary device_uvector<bool>.
src/main/cpp/src/get_json_object.cu Replaces valid_if with an IIFE that materializes validity booleans via thrust::transform into a temporary device_uvector<bool>, then calls bools_to_mask. The IIFE pattern is safe because stream ordering guarantees the kernel completes before the uvector is freed.
src/main/cpp/src/histogram.cu Minimal change: removes an unnecessary static_cast<int8_t> when assigning a bool expression to a bool element. The file already used bools_to_mask; no other functional changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["cudf::detail::valid_if\n(iterator + predicate → bitmask)"] -->|"Replaced by"| B["cudf::bools_to_mask\n(device_span&lt;bool&gt; → bitmask)"]
    
    subgraph Old["Old Pattern (detail API)"]
        A1["Iterator range + predicate functor"] --> A2["valid_if computes bools &\nproduces bitmask in one step"]
        A2 --> A3["Returns pair&lt;rmm::device_buffer, size_type&gt;"]
    end
    
    subgraph New["New Pattern (public API)"]
        B1["Materialize bools into\ndevice_uvector&lt;bool&gt;"] --> B2["Construct device_span&lt;bool const&gt;"]
        B2 --> B3["bools_to_mask converts\nbools → bitmask"]
        B3 --> B4["Returns pair&lt;unique_ptr&lt;device_buffer&gt;, size_type&gt;"]
        B4 --> B5["Extract via *ptr.release()"]
    end
Loading

Last reviewed commit: 505b3b6

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks
Copy link
Collaborator Author

Build

This change is more controversial.  The only way to get away from using
`cudf::detail::valid_if` in the files modified here is to materialize
a temporary bool vector (that is then packed).

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

c3f2550 is slightly controversial; the only way to stop using cudf::detail::valid_if is to materialize the boolean vector before packing it down to a null mask.

There might be value in requesting for a cudf::valid_if for this case, if the performance hit is too steep.

@mythrocks mythrocks marked this pull request as ready for review February 24, 2026 18:30
@mythrocks mythrocks changed the title [WIP] Replace cudf::detail::valid_if with cudf::bools_to_mask Replace cudf::detail::valid_if with cudf::bools_to_mask Feb 24, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Signed-off-by: MithunR <mithunr@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

Build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@sameerz sameerz requested a review from a team February 28, 2026 00:15
Copy link
Collaborator

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please hold off a little bit. We need to discuss on mitigating the issue with code duplicates and unavoidable dependency from cudf detail namespace. We should also avoid performance impact by doing this.


std::pair<rmm::device_buffer, cudf::size_type> create_null_mask(
cudf::size_type num_rows,
std::unique_ptr<cudf::column> const& should_be_nullified,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be viable to change this to should_be_valid, and forego the logical-not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants