Conversation
WalkthroughThe pull request adds a new cosecant (csc) trigonometric function to the DataFusion Spark UDF library. It introduces a trigonometry module in the math functions, implements the SparkCsc scalar function that computes 1.0 divided by sin(x), registers the function with type checking and arity validation for one argument, and includes test cases for scalar inputs, array inputs, and edge cases including infinity and null values. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Review: feat: support spark cscSummaryThis PR adds support for the Spark Code Quality ✅Strengths:
Areas for Improvement: 1. Duplicate Argument Validation (Minor)
The fn spark_csc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
// Remove this check - already validated by coerce_types
if args.len() != 1 {
return Err(invalid_arg_count_exec_err(
CSC_FUNCTION_NAME,
(1, 1),
args.len(),
));
}
// ... rest of implementation
}2. Inconsistent Pattern with CodebaseLooking at use datafusion_functions::utils::make_scalar_function;
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
make_scalar_function(spark_csc_impl, vec![])(&args.args)
}
pub fn spark_csc_impl(args: &[ArrayRef]) -> Result<ArrayRef> {
// Implementation using ArrayRef directly
}This would make the code more consistent with Potential Bugs 🐛1. Division by Zero Not Handled (High Priority)
When
2. NaN Handling Not TestedThe implementation doesn't test what happens with NaN inputs. Spark would return NaN, and your implementation will too, but it should be explicitly tested: #[test]
fn test_csc_nan() {
test_trig_float64_invoke!(SparkCsc::new(), Some(f64::NAN), Ok(Some(f64::NAN)));
}Performance Considerations ⚡The implementation is efficient:
Minor optimization opportunity: The pattern matching in Security Concerns 🔒No security issues identified. The function:
Test Coverage 📊Good coverage overall but missing some edge cases: Existing Tests ✅
Missing Test Cases ❌
Recommended Additional Tests:#[test]
fn test_csc_edge_cases() {
test_trig_float64_invoke!(SparkCsc::new(), Some(f64::NAN), Ok(Some(f64::NAN)));
test_trig_float64_invoke!(SparkCsc::new(), Some(std::f64::consts::PI), Ok(Some(f64::INFINITY)));
test_trig_float64_invoke!(SparkCsc::new(), Some(-std::f64::consts::PI), Ok(Some(f64::NEG_INFINITY)));
}
#[test]
fn test_csc_null_handling() {
test_trig_float64_invoke!(SparkCsc::new(), None, Ok(None));
}SQL Logic Test Quality 📝The
Consider adding test for Documentation 📚Missing: Link to Spark documentation in the code. The struct has it for other functions. Add: /// Spark CSC function: https://spark.apache.org/docs/latest/api/sql/index.html#csc
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct SparkCsc {Overall AssessmentThis is a solid implementation that follows DataFusion conventions well. The main issues are:
The code is production-ready with the additional test coverage. Great work on maintaining consistency with the existing codebase structure! Recommendation: Approve with minor changes requested for test coverage. |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
datafusion/spark/src/function/math/mod.rs(4 hunks)datafusion/spark/src/function/math/trigonometry.rs(1 hunks)datafusion/sqllogictest/test_files/spark/math/csc.slt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
datafusion/spark/src/function/math/trigonometry.rs (2)
datafusion/spark/src/function/error_utils.rs (2)
invalid_arg_count_exec_err(23-40)unsupported_data_type_exec_err(42-48)datafusion/expr-common/src/signature.rs (1)
user_defined(1091-1097)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
| if arg_types.len() != 1 { | ||
| return Err(invalid_arg_count_exec_err( | ||
| CSC_FUNCTION_NAME, | ||
| (1, 1), | ||
| arg_types.len(), | ||
| )); | ||
| } | ||
| if arg_types[0].is_numeric() { | ||
| Ok(vec![DataType::Float64]) | ||
| } else { | ||
| Err(unsupported_data_type_exec_err( | ||
| CSC_FUNCTION_NAME, | ||
| "Numeric Type", | ||
| &arg_types[0], | ||
| )) | ||
| } | ||
| } |
There was a problem hiding this comment.
Handle NULL inputs without throwing — With the current type coercion you reject DataType::Null, so a plain SELECT csc(NULL) surfaces Unsupported Data Type instead of propagating Spark’s expected NULL result. That’s a logic bug users will hit when optional columns bubble through this UDF. Please accept Null during coercion and short-circuit to a Float64(None) result in the evaluator.
@@
- if arg_types[0].is_numeric() {
+ if matches!(arg_types[0], DataType::Null) {
+ Ok(vec![DataType::Float64])
+ } else if arg_types[0].is_numeric() {
Ok(vec![DataType::Float64])
} else {
Err(unsupported_data_type_exec_err(
CSC_FUNCTION_NAME,
"Numeric Type",
&arg_types[0],
))
}
@@
- ColumnarValue::Scalar(ScalarValue::Float64(value)) => Ok(ColumnarValue::Scalar(
- ScalarValue::Float64(value.map(|x| 1.0 / x.sin())),
- )),
+ ColumnarValue::Scalar(ScalarValue::Float64(value)) => Ok(ColumnarValue::Scalar(
+ ScalarValue::Float64(value.map(|x| 1.0 / x.sin())),
+ )),
+ ColumnarValue::Scalar(ScalarValue::Null) => Ok(ColumnarValue::Scalar(
+ ScalarValue::Float64(None),
+ )),Also applies to: 107-128
🤖 Prompt for AI Agents
datafusion/spark/src/function/math/trigonometry.rs lines ~80-96 and ~107-128:
the coercion currently rejects DataType::Null and returns an
unsupported-data-type error instead of propagating NULL; update the type
coercion to accept DataType::Null (treat it as a valid input type alongside
numeric types) and in the evaluator short-circuit when the input scalar is null
to return a Float64(None) result; ensure the signature still reports Float64 as
output type but that runtime checks detect null input and return NULL without
attempting numeric conversion or throwing an error.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that Null handling is missing. Also tests for this use case should be added. This would make it behave as Apache Spark
value:useful; category:bug; feedback:The Claude AI reviewer is correct that more unit tests could be added to cover the edge cases with boundary values like NAN, INFINITY, NEG_INFINITY |
18642: To review by AI