3647: fix: Support on all-literal RLIKE expression by martin-augment · Pull Request #45 · martin-augment/datafusion-comet

martin-augment · 2026-03-11T08:59:48Z

3647: To review by AI

coderabbitai · 2026-03-11T09:00:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e319900c-8047-455e-a62c-c62ac94670dd

📥 Commits

Reviewing files that changed from the base of the PR and between 4f485a3 and 8e33b42.

📒 Files selected for processing (2)

native/spark-expr/src/predicate_funcs/rlike.rs
spark/src/test/resources/sql-tests/expressions/string/rlike_enabled.sql

Walkthrough

The RLike predicate function in the native Spark expression library was extended to handle scalar inputs, which were previously unsupported. Scalar null values return a null boolean, string scalars (UTF-8, LargeUtf8, or Utf8View variants) are evaluated against the regex pattern and return a boolean result, and non-string scalar types raise an error. A corresponding test for scalar UTF-8 RLIKE operations that was previously skipped has been enabled in the test suite.

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-3647-2026-03-11-08-59-45

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode · 2026-03-11T09:02:22Z

🤖 Augment PR Summary

Summary: This PR fixes support for RLIKE when the input expression is a scalar/literal.

Changes:

Extends RLike::evaluate to handle ColumnarValue::Scalar inputs by running the compiled regex against UTF-8 scalar values
Returns SQL-like NULL semantics for NULL RLIKE 'pattern' by producing a NULL boolean scalar
Adds a Rust unit test validating scalar/literal evaluation behavior
Enables the previously-ignored SQL test case for all-literal RLIKE expressions

Technical Notes: Scalar evaluation supports Utf8, LargeUtf8, and Utf8View scalar types and errors for non-string scalar inputs.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestion posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-03-11T09:02:23Z

native/spark-expr/src/predicate_funcs/rlike.rs

+            panic!("expected scalar result");
+        };
+
+        assert_eq!(result, ScalarValue::Boolean(Some(true)));


This test validates the happy-path match, but it doesn’t cover the new NULL/non-match behavior in the scalar branch (e.g., ScalarValue::Utf8(None) or a literal that doesn’t match). Consider extending coverage so regressions in scalar.is_null() handling and false matches get caught.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

claude · 2026-03-11T09:05:20Z

Code Review: Fix Support for All-Literal RLIKE Expressions. This is a clean, focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. WHAT IS GOOD: (1) Correct null handling: The scalar.is_null() check is placed before the match, correctly catching all null variants (Utf8(None), LargeUtf8(None), Utf8View(None)) and returning ScalarValue::Boolean(None) as expected for Spark semantics. (2) Complete type coverage: All three Arrow string scalar variants (Utf8, LargeUtf8, Utf8View) are handled. (3) Good error message: The internal_err! fallback for non-string scalars includes the actual data type, which aids debugging. (4) SQL test promotion: Changing query ignore(...) to a plain query is the right move now that this code path works correctly. SUGGESTIONS: (1) Test coverage gaps: The added unit test covers only the UTF-8 matching case (true result). Suggested additions: test_rlike_scalar_null_input (pass ScalarValue::Utf8(None), expect ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (pass a non-matching string, expect ScalarValue::Boolean(Some(false))). (2) Nit: s.as_str() is redundant in self.pattern.is_match(s.as_str()); passing &s achieves the same via auto-deref. SUMMARY: The fix is correct and well-scoped. The null-before-match ordering is sound, type coverage is complete, and the SQL test promotion is appropriate. Main suggestion: expand unit tests to explicitly cover null input and non-matching scalar cases.

claude · 2026-03-11T09:08:12Z

Code Review: Fix Support for All-Literal RLIKE Expressions This is a clean, focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. ### What is Good - Correct null handling ### Summary The fix is correct.

gemini-code-assist · 2026-03-11T09:08:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a limitation in the RLike physical expression by enabling it to process scalar (literal) input values. Previously, such operations would result in an internal error. The changes introduce logic to correctly evaluate RLike against constant string patterns, enhancing the robustness and completeness of the regular expression matching functionality within the system.

Highlights

RLike Scalar Input Support: The RLike physical expression now correctly handles scalar (literal) input values, allowing RLike operations on constant strings.
Error Handling Improvement: Improved error handling for RLike when encountering non-string scalar types, providing a more informative error message.
SQL Test Case Enabled: An existing SQL test case for literal RLike expressions, previously ignored due to a known issue, has been re-enabled and is now expected to pass.
Unit Test Coverage: A new unit test was added to specifically verify the functionality of RLike with UTF8 scalar literals.

Changelog

native/spark-expr/src/predicate_funcs/rlike.rs
- Imported ScalarValue for use in the evaluate method.
- Modified the evaluate method to handle ColumnarValue::Scalar inputs, performing RLike matching directly on scalar string values.
- Added null handling for scalar inputs, returning a null boolean result if the scalar is null.
- Introduced a new unit test test_rlike_scalar_utf8_literal to validate scalar RLike functionality.
spark/src/test/resources/sql-tests/expressions/string/rlike_enabled.sql
- Removed the ignore directive from a SQL test query that uses literal arguments with RLike, indicating the fix for issue Native engine crashes on all-literal RLIKE expression apache/datafusion-comet#3343.

Activity

The pull request description indicates it is 'To review by AI', suggesting it's ready for initial automated review or processing.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2026-03-11T09:09:15Z

Review: Fix Support for All-Literal RLIKE Expressions. This is a clean focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. WHAT IS GOOD: (1) Correct null handling - scalar.is_null() check before the match correctly catches all null variants and returns ScalarValue::Boolean(None). (2) Complete type coverage - all three Arrow string scalar variants (Utf8, LargeUtf8, Utf8View) are handled. (3) Good error message - internal_err! fallback includes the actual data type. (4) SQL test promotion - changing query ignore(...) to a plain query is appropriate now that this code path works. SUGGESTIONS: (1) Test coverage - the unit test only covers the UTF-8 matching case. Consider adding test_rlike_scalar_null_input (ScalarValue::Utf8(None) -> ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (non-matching string -> ScalarValue::Boolean(Some(false))). These cases are covered by SQL integration tests but explicit unit tests would catch regressions faster. (2) Nit - s.as_str() is redundant since s is a String; passing &s works via auto-deref. SUMMARY: The fix is correct and well-scoped. The null-before-match ordering is sound, type coverage is complete, and the SQL test promotion is appropriate. Main suggestion is to add unit tests for null input and non-matching scalar cases.

gemini-code-assist

Code Review

This pull request successfully addresses the issue of supporting RLIKE expressions with all-literal scalar inputs. The changes correctly implement the evaluation logic for scalar string values and handle null inputs appropriately. The removal of the ignore directive in the SQL test confirms the fix. To further enhance the robustness of the RLike implementation, consider adding additional unit tests for edge cases, specifically for null scalar inputs and non-string scalar inputs.

gemini-code-assist · 2026-03-11T09:10:26Z

native/spark-expr/src/predicate_funcs/rlike.rs

+        };
+
+        assert_eq!(result, ScalarValue::Boolean(Some(true)));
+    }


The added unit test test_rlike_scalar_utf8_literal is a good start for covering the new scalar RLIKE functionality. To ensure comprehensive test coverage, it would be beneficial to add tests for the following scenarios:

Null input: A test case where the input ScalarValue is None to verify that ScalarValue::Boolean(None) is returned.

Non-string input: A test case with a ScalarValue of a non-string type (e.g., ScalarValue::Int32) to confirm that the internal_err! is correctly triggered, or that it returns null if that's the desired behavior for non-string types.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

martin-augment · 2026-03-11T09:21:23Z

(1) Test coverage gaps: The added unit test covers only the UTF-8 matching case (true result). Suggested additions: test_rlike_scalar_null_input (pass ScalarValue::Utf8(None), expect ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (pass a non-matching string, expect ScalarValue::Boolean(Some(false))).

value:useful; category:bug; feedback: The Claude AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

0lai0 and others added 4 commits March 8, 2026 01:32

Native engine crashes on all-literal RLIKE expression

052add6

Merge branch 'apache:main' into native_engine_literal_RLIKE

ddaa0de

add test

778e3f7

address comment

8e33b42

augmentcode bot reviewed Mar 11, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

Conversation

martin-augment commented Mar 11, 2026

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

augmentcode bot commented Mar 11, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 11, 2026 •

edited

Loading

claude bot commented Mar 11, 2026 •

edited

Loading

claude bot commented Mar 11, 2026 •

edited

Loading