3647: fix: Support on all-literal RLIKE expression#45
3647: fix: Support on all-literal RLIKE expression#45martin-augment wants to merge 4 commits intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe RLike predicate function in the native Spark expression library was extended to handle scalar inputs, which were previously unsupported. Scalar null values return a null boolean, string scalars (UTF-8, LargeUtf8, or Utf8View variants) are evaluated against the regex pattern and return a boolean result, and non-string scalar types raise an error. A corresponding test for scalar UTF-8 RLIKE operations that was previously skipped has been enabled in the test suite. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🤖 Augment PR SummarySummary: This PR fixes support for Changes:
Technical Notes: Scalar evaluation supports 🤖 Was this summary useful? React with 👍 or 👎 |
| panic!("expected scalar result"); | ||
| }; | ||
|
|
||
| assert_eq!(result, ScalarValue::Boolean(Some(true))); |
There was a problem hiding this comment.
This test validates the happy-path match, but it doesn’t cover the new NULL/non-match behavior in the scalar branch (e.g., ScalarValue::Utf8(None) or a literal that doesn’t match). Consider extending coverage so regressions in scalar.is_null() handling and false matches get caught.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.
|
Code Review: Fix Support for All-Literal RLIKE Expressions. This is a clean, focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. WHAT IS GOOD: (1) Correct null handling: The scalar.is_null() check is placed before the match, correctly catching all null variants (Utf8(None), LargeUtf8(None), Utf8View(None)) and returning ScalarValue::Boolean(None) as expected for Spark semantics. (2) Complete type coverage: All three Arrow string scalar variants (Utf8, LargeUtf8, Utf8View) are handled. (3) Good error message: The internal_err! fallback for non-string scalars includes the actual data type, which aids debugging. (4) SQL test promotion: Changing query ignore(...) to a plain query is the right move now that this code path works correctly. SUGGESTIONS: (1) Test coverage gaps: The added unit test covers only the UTF-8 matching case (true result). Suggested additions: test_rlike_scalar_null_input (pass ScalarValue::Utf8(None), expect ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (pass a non-matching string, expect ScalarValue::Boolean(Some(false))). (2) Nit: s.as_str() is redundant in self.pattern.is_match(s.as_str()); passing &s achieves the same via auto-deref. SUMMARY: The fix is correct and well-scoped. The null-before-match ordering is sound, type coverage is complete, and the SQL test promotion is appropriate. Main suggestion: expand unit tests to explicitly cover null input and non-matching scalar cases. |
Code Review: Fix Support for All-Literal RLIKE Expressions This is a clean, focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. ### What is Good - Correct null handling ### Summary The fix is correct. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a limitation in the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Review: Fix Support for All-Literal RLIKE Expressions. This is a clean focused fix that correctly handles the case where a RLIKE expression child evaluates to a scalar value. Previously this path returned an error unconditionally. WHAT IS GOOD: (1) Correct null handling - scalar.is_null() check before the match correctly catches all null variants and returns ScalarValue::Boolean(None). (2) Complete type coverage - all three Arrow string scalar variants (Utf8, LargeUtf8, Utf8View) are handled. (3) Good error message - internal_err! fallback includes the actual data type. (4) SQL test promotion - changing query ignore(...) to a plain query is appropriate now that this code path works. SUGGESTIONS: (1) Test coverage - the unit test only covers the UTF-8 matching case. Consider adding test_rlike_scalar_null_input (ScalarValue::Utf8(None) -> ScalarValue::Boolean(None)) and test_rlike_scalar_no_match (non-matching string -> ScalarValue::Boolean(Some(false))). These cases are covered by SQL integration tests but explicit unit tests would catch regressions faster. (2) Nit - s.as_str() is redundant since s is a String; passing &s works via auto-deref. SUMMARY: The fix is correct and well-scoped. The null-before-match ordering is sound, type coverage is complete, and the SQL test promotion is appropriate. Main suggestion is to add unit tests for null input and non-matching scalar cases. |
There was a problem hiding this comment.
Code Review
This pull request successfully addresses the issue of supporting RLIKE expressions with all-literal scalar inputs. The changes correctly implement the evaluation logic for scalar string values and handle null inputs appropriately. The removal of the ignore directive in the SQL test confirms the fix. To further enhance the robustness of the RLike implementation, consider adding additional unit tests for edge cases, specifically for null scalar inputs and non-string scalar inputs.
| }; | ||
|
|
||
| assert_eq!(result, ScalarValue::Boolean(Some(true))); | ||
| } |
There was a problem hiding this comment.
The added unit test test_rlike_scalar_utf8_literal is a good start for covering the new scalar RLIKE functionality. To ensure comprehensive test coverage, it would be beneficial to add tests for the following scenarios:
- Null input: A test case where the input
ScalarValueisNoneto verify thatScalarValue::Boolean(None)is returned. - Non-string input: A test case with a
ScalarValueof a non-string type (e.g.,ScalarValue::Int32) to confirm that theinternal_err!is correctly triggered, or that it returns null if that's the desired behavior for non-string types.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.
value:useful; category:bug; feedback: The Claude AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic. |
3647: To review by AI