18642: feat: support spark csc by martin-augment · Pull Request #13 · martin-augment/datafusion

martin-augment · 2025-11-12T09:05:42Z

18642: To review by AI

coderabbitai · 2025-11-12T09:06:18Z

Walkthrough

The pull request adds a new cosecant (csc) trigonometric function to the DataFusion Spark UDF library. It introduces a trigonometry module in the math functions, implements the SparkCsc scalar function that computes 1.0 divided by sin(x), registers the function with type checking and arity validation for one argument, and includes test cases for scalar inputs, array inputs, and edge cases including infinity and null values.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-18642-2025-11-12-09-05-38

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-11-12T09:07:40Z

PR Review: feat: support spark csc

Summary

This PR adds support for the Spark csc (cosecant) function. The implementation follows DataFusion patterns and includes a new trigonometry module for math functions.

Code Quality ✅

Strengths:

Follows existing code patterns (similar to expm1.rs)
Proper Apache license header
Appropriate use of error utilities
Clean struct implementation with Default trait

Areas for Improvement:

1. Duplicate Argument Validation (Minor)

datafusion/spark/src/function/math/trigonometry.rs:98-105

The spark_csc function validates arguments again, but this is already handled by coerce_types. Consider removing the duplicate check in the main function:

fn spark_csc(args: &[ColumnarValue]) -> Result<ColumnarValue> {
    // Remove this check - already validated by coerce_types
    if args.len() != 1 {
        return Err(invalid_arg_count_exec_err(
            CSC_FUNCTION_NAME,
            (1, 1),
            args.len(),
        ));
    }
    // ... rest of implementation
}

2. Inconsistent Pattern with Codebase

Looking at rint.rs, similar math functions use make_scalar_function from datafusion_functions::utils for cleaner code organization. Consider refactoring to match:

use datafusion_functions::utils::make_scalar_function;

fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
    make_scalar_function(spark_csc_impl, vec![])(&args.args)
}

pub fn spark_csc_impl(args: &[ArrayRef]) -> Result<ArrayRef> {
    // Implementation using ArrayRef directly
}

This would make the code more consistent with rint.rs:71 and other functions.

Potential Bugs 🐛

1. Division by Zero Not Handled (High Priority)

datafusion/spark/src/function/math/trigonometry.rs:108

When sin(x) = 0, the function returns f64::INFINITY, which matches Spark behavior. However, this should be documented and tested more thoroughly:

Test case at line 156 covers csc(0) = INFINITY ✅
Missing test for when sin(x) approaches zero from negative side (e.g., csc(-π) should be -INFINITY)
Consider adding edge case tests for multiples of π

2. NaN Handling Not Tested

The implementation doesn't test what happens with NaN inputs. Spark would return NaN, and your implementation will too, but it should be explicitly tested:

#[test]
fn test_csc_nan() {
    test_trig_float64_invoke!(SparkCsc::new(), Some(f64::NAN), Ok(Some(f64::NAN)));
}

Performance Considerations ⚡

The implementation is efficient:

Uses Arrow's unary operation for vectorization ✅
No unnecessary allocations ✅
Direct computation without intermediate buffers ✅

Minor optimization opportunity: The pattern matching in spark_csc could be simplified by relying on type coercion always converting to Float64, eliminating the need for runtime type checking.

Security Concerns 🔒

No security issues identified. The function:

Doesn't perform any unsafe operations ✅
Uses bounded arithmetic (division by zero → infinity, not panic) ✅
Properly handles null values ✅

Test Coverage 📊

Good coverage overall but missing some edge cases:

Existing Tests ✅

Basic scalar value (csc(0))
Array processing ([1, 0, -1])
Scalar computation (csc(π/2) = 1)
SQL integration tests in .slt file

Missing Test Cases ❌

Negative infinity: csc(x) when sin(x) → 0 from below (e.g., x = -π)
NaN handling: csc(NaN) should return NaN
Positive infinity: csc(π) should return INFINITY or handle gracefully
Large values: Test with very large inputs to ensure numerical stability
Mixed null arrays: [Some(1.0), None, Some(2.0)]

Recommended Additional Tests:

#[test]
fn test_csc_edge_cases() {
    test_trig_float64_invoke!(SparkCsc::new(), Some(f64::NAN), Ok(Some(f64::NAN)));
    test_trig_float64_invoke!(SparkCsc::new(), Some(std::f64::consts::PI), Ok(Some(f64::INFINITY)));
    test_trig_float64_invoke!(SparkCsc::new(), Some(-std::f64::consts::PI), Ok(Some(f64::NEG_INFINITY)));
}

#[test]
fn test_csc_null_handling() {
    test_trig_float64_invoke!(SparkCsc::new(), None, Ok(None));
}

SQL Logic Test Quality 📝

The .slt file test is good but could be enhanced:

Tests basic functionality ✅
Tests null handling ✅
Tests special values (π/2) ✅

Consider adding test for csc(π) to verify infinity handling in SQL context.

Documentation 📚

Missing: Link to Spark documentation in the code. The struct has it for other functions. Add:

/// Spark CSC function: https://spark.apache.org/docs/latest/api/sql/index.html#csc
#[derive(Debug, PartialEq, Eq, Hash)]
pub struct SparkCsc {

Overall Assessment

This is a solid implementation that follows DataFusion conventions well. The main issues are:

Must Fix: Add edge case tests (NaN, ±π, nulls)
Should Fix: Remove duplicate argument validation
Consider: Refactor to use make_scalar_function for consistency with rint.rs
Nice to Have: Add documentation link

The code is production-ready with the additional test coverage. Great work on maintaining consistency with the existing codebase structure!

Recommendation: Approve with minor changes requested for test coverage.

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc49fc0 and 2e2e988.

📒 Files selected for processing (3)

datafusion/spark/src/function/math/mod.rs (4 hunks)
datafusion/spark/src/function/math/trigonometry.rs (1 hunks)
datafusion/sqllogictest/test_files/spark/math/csc.slt (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

datafusion/spark/src/function/math/trigonometry.rs (2)

datafusion/spark/src/function/error_utils.rs (2)

invalid_arg_count_exec_err (23-40)

unsupported_data_type_exec_err (42-48)

datafusion/expr-common/src/signature.rs (1)

user_defined (1091-1097)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: Analyze (rust)

coderabbitai · 2025-11-12T09:12:38Z

+        if arg_types.len() != 1 {
+            return Err(invalid_arg_count_exec_err(
+                CSC_FUNCTION_NAME,
+                (1, 1),
+                arg_types.len(),
+            ));
+        }
+        if arg_types[0].is_numeric() {
+            Ok(vec![DataType::Float64])
+        } else {
+            Err(unsupported_data_type_exec_err(
+                CSC_FUNCTION_NAME,
+                "Numeric Type",
+                &arg_types[0],
+            ))
+        }
+    }


⚠️ Potential issue | 🟠 Major

Handle NULL inputs without throwing — With the current type coercion you reject DataType::Null, so a plain SELECT csc(NULL) surfaces Unsupported Data Type instead of propagating Spark’s expected NULL result. That’s a logic bug users will hit when optional columns bubble through this UDF. Please accept Null during coercion and short-circuit to a Float64(None) result in the evaluator.

@@ - if arg_types[0].is_numeric() { + if matches!(arg_types[0], DataType::Null) { + Ok(vec![DataType::Float64]) + } else if arg_types[0].is_numeric() { Ok(vec![DataType::Float64]) } else { Err(unsupported_data_type_exec_err( CSC_FUNCTION_NAME, "Numeric Type", &arg_types[0], )) } @@ - ColumnarValue::Scalar(ScalarValue::Float64(value)) => Ok(ColumnarValue::Scalar( - ScalarValue::Float64(value.map(|x| 1.0 / x.sin())), - )), + ColumnarValue::Scalar(ScalarValue::Float64(value)) => Ok(ColumnarValue::Scalar( + ScalarValue::Float64(value.map(|x| 1.0 / x.sin())), + )), + ColumnarValue::Scalar(ScalarValue::Null) => Ok(ColumnarValue::Scalar( + ScalarValue::Float64(None), + )),

Also applies to: 107-128

🤖 Prompt for AI Agents

datafusion/spark/src/function/math/trigonometry.rs lines ~80-96 and ~107-128: the coercion currently rejects DataType::Null and returns an unsupported-data-type error instead of propagating NULL; update the type coercion to accept DataType::Null (treat it as a valid input type alongside numeric types) and in the evaluator short-circuit when the input scalar is null to return a Float64(None) result; ensure the signature still reports Float64 as output type but that runtime checks detect null input and return NULL without attempting numeric conversion or throwing an error.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct that Null handling is missing. Also tests for this use case should be added. This would make it behave as Apache Spark

martin-augment · 2025-11-12T09:28:06Z

Missing Test Cases ❌

Negative infinity: csc(x) when sin(x) → 0 from below (e.g., x = -π)

NaN handling: csc(NaN) should return NaN

Positive infinity: csc(π) should return INFINITY or handle gracefully

Large values: Test with very large inputs to ensure numerical stability

Mixed null arrays: [Some(1.0), None, Some(2.0)]

value:useful; category:bug; feedback:The Claude AI reviewer is correct that more unit tests could be added to cover the edge cases with boundary values like NAN, INFINITY, NEG_INFINITY

feat: support spark csc

2e2e988

augmentcode bot reviewed Nov 12, 2025

View reviewed changes

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

18642: feat: support spark csc#13

18642: feat: support spark csc#13
martin-augment wants to merge 1 commit intomainfrom
pr-18642-2025-11-12-09-05-38

martin-augment commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 12, 2025

Uh oh!

augmentcode bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 12, 2025

Uh oh!

martin-augment Nov 12, 2025

Uh oh!

martin-augment commented Nov 12, 2025

Missing Test Cases ❌

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Nov 12, 2025

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Nov 12, 2025

PR Review: feat: support spark csc

Summary

Code Quality ✅

1. Duplicate Argument Validation (Minor)

2. Inconsistent Pattern with Codebase

Potential Bugs 🐛

1. Division by Zero Not Handled (High Priority)

2. NaN Handling Not Tested

Performance Considerations ⚡

Security Concerns 🔒

Test Coverage 📊

Existing Tests ✅

Missing Test Cases ❌

Recommended Additional Tests:

SQL Logic Test Quality 📝

Documentation 📚

Overall Assessment

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Nov 12, 2025

Missing Test Cases ❌

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 12, 2025 •

edited

Loading