Skip to content

3647: fix: Support on all-literal RLIKE expression#45

Open
martin-augment wants to merge 4 commits intomainfrom
pr-3647-2026-03-11-08-59-45
Open

3647: fix: Support on all-literal RLIKE expression#45
martin-augment wants to merge 4 commits intomainfrom
pr-3647-2026-03-11-08-59-45

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3647: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e319900c-8047-455e-a62c-c62ac94670dd

📥 Commits

Reviewing files that changed from the base of the PR and between 4f485a3 and 8e33b42.

📒 Files selected for processing (2)
  • native/spark-expr/src/predicate_funcs/rlike.rs
  • spark/src/test/resources/sql-tests/expressions/string/rlike_enabled.sql

Walkthrough

The RLike predicate function in the native Spark expression library was extended to handle scalar inputs, which were previously unsupported. Scalar null values return a null boolean, string scalars (UTF-8, LargeUtf8, or Utf8View variants) are evaluated against the regex pattern and return a boolean result, and non-string scalar types raise an error. A corresponding test for scalar UTF-8 RLIKE operations that was previously skipped has been enabled in the test suite.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-3647-2026-03-11-08-59-45

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 11, 2026

🤖 Augment PR Summary

Summary: This PR fixes support for RLIKE when the input expression is a scalar/literal.

Changes:

  • Extends RLike::evaluate to handle ColumnarValue::Scalar inputs by running the compiled regex against UTF-8 scalar values
  • Returns SQL-like NULL semantics for NULL RLIKE 'pattern' by producing a NULL boolean scalar
  • Adds a Rust unit test validating scalar/literal evaluation behavior
  • Enables the previously-ignored SQL test case for all-literal RLIKE expressions

Technical Notes: Scalar evaluation supports Utf8, LargeUtf8, and Utf8View scalar types and errors for non-string scalar inputs.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

panic!("expected scalar result");
};

assert_eq!(result, ScalarValue::Boolean(Some(true)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test validates the happy-path match, but it doesn’t cover the new NULL/non-match behavior in the scalar branch (e.g., ScalarValue::Utf8(None) or a literal that doesn’t match). Consider extending coverage so regressions in scalar.is_null() handling and false matches get caught.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

@claude
Copy link
Copy Markdown

claude bot commented Mar 11, 2026

Code Review: Fix Support for All-Literal RLIKE Expressions. This is a clean, focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. WHAT IS GOOD: (1) Correct null handling: The scalar.is_null() check is placed before the match, correctly catching all null variants (Utf8(None), LargeUtf8(None), Utf8View(None)) and returning ScalarValue::Boolean(None) as expected for Spark semantics. (2) Complete type coverage: All three Arrow string scalar variants (Utf8, LargeUtf8, Utf8View) are handled. (3) Good error message: The internal_err! fallback for non-string scalars includes the actual data type, which aids debugging. (4) SQL test promotion: Changing query ignore(...) to a plain query is the right move now that this code path works correctly. SUGGESTIONS: (1) Test coverage gaps: The added unit test covers only the UTF-8 matching case (true result). Suggested additions: test_rlike_scalar_null_input (pass ScalarValue::Utf8(None), expect ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (pass a non-matching string, expect ScalarValue::Boolean(Some(false))). (2) Nit: s.as_str() is redundant in self.pattern.is_match(s.as_str()); passing &s achieves the same via auto-deref. SUMMARY: The fix is correct and well-scoped. The null-before-match ordering is sound, type coverage is complete, and the SQL test promotion is appropriate. Main suggestion: expand unit tests to explicitly cover null input and non-matching scalar cases.

@claude
Copy link
Copy Markdown

claude bot commented Mar 11, 2026

Code Review: Fix Support for All-Literal RLIKE Expressions This is a clean, focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. ### What is Good - Correct null handling ### Summary The fix is correct.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a limitation in the RLike physical expression by enabling it to process scalar (literal) input values. Previously, such operations would result in an internal error. The changes introduce logic to correctly evaluate RLike against constant string patterns, enhancing the robustness and completeness of the regular expression matching functionality within the system.

Highlights

  • RLike Scalar Input Support: The RLike physical expression now correctly handles scalar (literal) input values, allowing RLike operations on constant strings.
  • Error Handling Improvement: Improved error handling for RLike when encountering non-string scalar types, providing a more informative error message.
  • SQL Test Case Enabled: An existing SQL test case for literal RLike expressions, previously ignored due to a known issue, has been re-enabled and is now expected to pass.
  • Unit Test Coverage: A new unit test was added to specifically verify the functionality of RLike with UTF8 scalar literals.
Changelog
  • native/spark-expr/src/predicate_funcs/rlike.rs
    • Imported ScalarValue for use in the evaluate method.
    • Modified the evaluate method to handle ColumnarValue::Scalar inputs, performing RLike matching directly on scalar string values.
    • Added null handling for scalar inputs, returning a null boolean result if the scalar is null.
    • Introduced a new unit test test_rlike_scalar_utf8_literal to validate scalar RLike functionality.
  • spark/src/test/resources/sql-tests/expressions/string/rlike_enabled.sql
Activity
  • The pull request description indicates it is 'To review by AI', suggesting it's ready for initial automated review or processing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Copy Markdown

claude bot commented Mar 11, 2026

Review: Fix Support for All-Literal RLIKE Expressions. This is a clean focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. WHAT IS GOOD: (1) Correct null handling - scalar.is_null() check before the match correctly catches all null variants and returns ScalarValue::Boolean(None). (2) Complete type coverage - all three Arrow string scalar variants (Utf8, LargeUtf8, Utf8View) are handled. (3) Good error message - internal_err! fallback includes the actual data type. (4) SQL test promotion - changing query ignore(...) to a plain query is appropriate now that this code path works. SUGGESTIONS: (1) Test coverage - the unit test only covers the UTF-8 matching case. Consider adding test_rlike_scalar_null_input (ScalarValue::Utf8(None) -> ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (non-matching string -> ScalarValue::Boolean(Some(false))). These cases are covered by SQL integration tests but explicit unit tests would catch regressions faster. (2) Nit - s.as_str() is redundant since s is a String; passing &s works via auto-deref. SUMMARY: The fix is correct and well-scoped. The null-before-match ordering is sound, type coverage is complete, and the SQL test promotion is appropriate. Main suggestion is to add unit tests for null input and non-matching scalar cases.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully addresses the issue of supporting RLIKE expressions with all-literal scalar inputs. The changes correctly implement the evaluation logic for scalar string values and handle null inputs appropriately. The removal of the ignore directive in the SQL test confirms the fix. To further enhance the robustness of the RLike implementation, consider adding additional unit tests for edge cases, specifically for null scalar inputs and non-string scalar inputs.

};

assert_eq!(result, ScalarValue::Boolean(Some(true)));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The added unit test test_rlike_scalar_utf8_literal is a good start for covering the new scalar RLIKE functionality. To ensure comprehensive test coverage, it would be beneficial to add tests for the following scenarios:

  • Null input: A test case where the input ScalarValue is None to verify that ScalarValue::Boolean(None) is returned.
  • Non-string input: A test case with a ScalarValue of a non-string type (e.g., ScalarValue::Int32) to confirm that the internal_err! is correctly triggered, or that it returns null if that's the desired behavior for non-string types.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

@martin-augment
Copy link
Copy Markdown
Owner Author

(1) Test coverage gaps: The added unit test covers only the UTF-8 matching case (true result). Suggested additions: test_rlike_scalar_null_input (pass ScalarValue::Utf8(None), expect ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (pass a non-matching string, expect ScalarValue::Boolean(Some(false))).

value:useful; category:bug; feedback: The Claude AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants