Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 42 additions & 3 deletions native/spark-expr/src/predicate_funcs/rlike.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ use arrow::array::types::Int32Type;
use arrow::array::{Array, BooleanArray, DictionaryArray, RecordBatch, StringArray};
use arrow::compute::take;
use arrow::datatypes::{DataType, Schema};
use datafusion::common::{internal_err, Result};
use datafusion::common::{internal_err, Result, ScalarValue};
use datafusion::physical_expr::PhysicalExpr;
use datafusion::physical_plan::ColumnarValue;
use regex::Regex;
Expand Down Expand Up @@ -140,8 +140,24 @@ impl PhysicalExpr for RLike {
let array = self.is_match(inputs);
Ok(ColumnarValue::Array(Arc::new(array)))
}
ColumnarValue::Scalar(_) => {
internal_err!("non scalar regexp patterns are not supported")
ColumnarValue::Scalar(scalar) => {
if scalar.is_null() {
return Ok(ColumnarValue::Scalar(ScalarValue::Boolean(None)));
}

let is_match = match scalar {
ScalarValue::Utf8(Some(s))
| ScalarValue::LargeUtf8(Some(s))
| ScalarValue::Utf8View(Some(s)) => self.pattern.is_match(s.as_str()),
_ => {
return internal_err!(
"RLike requires string type for input, got {:?}",
scalar.data_type()
);
}
};

Ok(ColumnarValue::Scalar(ScalarValue::Boolean(Some(is_match))))
}
}
}
Expand All @@ -165,3 +181,26 @@ impl PhysicalExpr for RLike {
Display::fmt(self, f)
}
}

#[cfg(test)]
mod tests {
use super::*;
use datafusion::physical_expr::expressions::Literal;

#[test]
fn test_rlike_scalar_utf8_literal() {
let expr = RLike::try_new(
Arc::new(Literal::new(ScalarValue::Utf8(Some("Rose".to_string())))),
"R[a-z]+",
)
.unwrap();
let result = expr
.evaluate(&RecordBatch::new_empty(Arc::new(Schema::empty())))
.unwrap();
let ColumnarValue::Scalar(result) = result else {
panic!("expected scalar result");
};

assert_eq!(result, ScalarValue::Boolean(Some(true)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test validates the happy-path match, but it doesn’t cover the new NULL/non-match behavior in the scalar branch (e.g., ScalarValue::Utf8(None) or a literal that doesn’t match). Consider extending coverage so regressions in scalar.is_null() handling and false matches get caught.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The added unit test test_rlike_scalar_utf8_literal is a good start for covering the new scalar RLIKE functionality. To ensure comprehensive test coverage, it would be beneficial to add tests for the following scenarios:

  • Null input: A test case where the input ScalarValue is None to verify that ScalarValue::Boolean(None) is returned.
  • Non-string input: A test case with a ScalarValue of a non-string type (e.g., ScalarValue::Int32) to confirm that the internal_err! is correctly triggered, or that it returns null if that's the desired behavior for non-string types.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The new unit test tests only the happy path with Utf8(Some). The other Utf8 types (Large and View) are not covered. The NULL case is also not covered (Utf(None)). Testing with a non-Utf8 type would also cover the error handling logic.

}
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,5 @@ query
SELECT s RLIKE '' FROM test_rlike_enabled

-- literal arguments
query ignore(https://github.com/apache/datafusion-comet/issues/3343)
query
SELECT 'hello' RLIKE '^[a-z]+$', '12345' RLIKE '^[a-z]+$', '' RLIKE '', NULL RLIKE 'a'
Loading