18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark.#25
18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark.#25martin-augment wants to merge 3 commits intomainfrom
Conversation
WalkthroughThis pull request refactors error handling across multiple Spark dialect functions in the datetime, hash, and math modules. The changes systematically replace explicit ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| other, | ||
| ColumnarValue::Array(_) | ||
| | ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_)) | ||
| ), |
There was a problem hiding this comment.
Bug: Match Arm Logic Inversion: Incorrect Control Flow
The assertion logic is inverted in the match arm. The old code unconditionally returned an error for any value in the other arm. The new assertion checks if other matches expected types and returns an error only if it doesn't match. However, if the assertion passes (value matches expected types), execution continues without producing a value for the match arm or returning from the function, which is incorrect. The assertion should likely check that other does NOT match the expected patterns, or this should revert to an unconditional error return.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The check seems wrong for two reasons: 1) ColumnarValue::Array(arr) is already covered earlier in the first arm of the match; 2) ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_)) could match only for ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(None)) because the case for Some(iv) is also handled in an earlier arm. It would be cleaner if the None case was handled as the other match arms, and other always returns Err, as before.
| assert_eq_or_internal_err!( | ||
| args.len(), | ||
| 2, | ||
| "Spark `date_add` function requires 2 arguments" |
There was a problem hiding this comment.
Consider preserving the actual argument count in the error message (e.g., include got {} with args.len()), as the previous version provided this detail; this regression also appears in similar checks (crc32, sha1, last_day).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct that useful debug information is lost in this change and it should be reverted. Prevents harder debugging when the error occurs.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
datafusion/spark/src/function/hash/sha1.rs (1)
119-126: Consider simplifying the error handling pattern.The current pattern combines destructuring with an assertion that will always fail when the else block is entered, making it harder to understand. While functionally correct, consider one of these simpler alternatives:
Option 1 (assertion before destructuring):
-let [input] = args else { - assert_eq_or_internal_err!( - args.len(), - 1, - "Spark `sha1` function requires 1 argument" - ); - unreachable!() -}; +assert_eq_or_internal_err!( + args.len(), + 1, + "Spark `sha1` function requires 1 argument" +); +let input = &args[0];Option 2 (if error return is preferred over panic):
let [input] = args else { - assert_eq_or_internal_err!( - args.len(), - 1, - "Spark `sha1` function requires 1 argument" - ); - unreachable!() + return internal_err!( + "Spark `sha1` function requires 1 argument, got {}", + args.len() + ); };Note: If this pattern is intentional as part of the systematic refactoring across the codebase, feel free to disregard this suggestion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
datafusion/spark/src/function/datetime/date_add.rs(2 hunks)datafusion/spark/src/function/datetime/date_sub.rs(3 hunks)datafusion/spark/src/function/datetime/last_day.rs(2 hunks)datafusion/spark/src/function/datetime/make_interval.rs(2 hunks)datafusion/spark/src/function/hash/crc32.rs(2 hunks)datafusion/spark/src/function/hash/sha1.rs(2 hunks)datafusion/spark/src/function/math/factorial.rs(2 hunks)datafusion/spark/src/function/math/hex.rs(2 hunks)datafusion/spark/src/function/math/modulus.rs(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
Applied to files:
datafusion/spark/src/function/datetime/date_sub.rsdatafusion/spark/src/function/math/factorial.rsdatafusion/spark/src/function/datetime/last_day.rsdatafusion/spark/src/function/datetime/date_add.rsdatafusion/spark/src/function/hash/sha1.rsdatafusion/spark/src/function/datetime/make_interval.rsdatafusion/spark/src/function/math/hex.rsdatafusion/spark/src/function/math/modulus.rsdatafusion/spark/src/function/hash/crc32.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Cursor Bugbot
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
🔇 Additional comments (11)
datafusion/spark/src/function/math/hex.rs (1)
32-34: LGTM! Clean refactor to assertion-based validation.The migration from manual error checks to
assert_eq_or_internal_err!is consistent with the PR objective and maintains identical error semantics.Also applies to: 188-188
datafusion/spark/src/function/math/factorial.rs (1)
25-27: LGTM! Consistent with PR-wide refactoring pattern.The assertion-based argument validation is clean and consistent.
Also applies to: 104-104
datafusion/spark/src/function/hash/crc32.rs (1)
27-27: LGTM! Defensive pattern ensures type safety.The
else { assert + unreachable!() }pattern is correct: if destructuring fails, the assertion validates and returns an error, whileunreachable!()satisfies the compiler's exhaustiveness requirement.Also applies to: 108-113
datafusion/spark/src/function/datetime/date_add.rs (1)
28-30: LGTM! Consistent with codebase refactoring.The argument count validation using assertions is clean and follows the established pattern.
Also applies to: 93-98
datafusion/spark/src/function/datetime/last_day.rs (1)
24-27: LGTM! Clean assertion-based validation.The refactor maintains error semantics while standardizing the validation approach.
Also applies to: 71-76
datafusion/spark/src/function/datetime/make_interval.rs (1)
241-244: LGTM! Test assertions improve readability.The refactored test validations using
assert_eq_or_internal_err!andassert_or_internal_err!are more declarative and consistent with the broader codebase style.Also applies to: 539-576
datafusion/spark/src/function/datetime/date_sub.rs (2)
28-30: LGTM! Standard assertion-based validation.The argument count assertion follows the established PR pattern.
Also applies to: 87-92
133-138: LGTM! Defensive assertion in unreachable branch.The
assert_or_internal_err!in the default match arm validates that we should never reach this code path (sincecoerce_typesshould prevent invalid types). The assertion intentionally checks for valid types (which will fail in this branch), generating an internal error beforeunreachable!()satisfies the compiler. While this pattern might seem unusual, it's consistent with the PR's defensive programming approach.datafusion/spark/src/function/math/modulus.rs (2)
21-23: LGTM! Consistent assertion-based validation.The refactor to
assert_eq_or_internal_err!for argument count checks is clean and maintains identical error semantics across bothmodandpmodimplementations.Also applies to: 32-32, 88-92, 138-142
40-52: LGTM! Correct implementation of non-negative remainder semantics.The enhanced
pmodlogic correctly implements positive modulo behavior:
- Computes standard remainder:
result = left % right- Identifies negative remainders:
neg = (result < 0)- Adjusts by adding divisor where negative:
result = result + (neg ? right : 0)- Normalizes with final modulo:
result = result % rightExample:
-7 pmod 3→-1→-1 + 3 = 2→2 % 3 = 2✓The test coverage (lines 359-610) validates correctness for:
- Positive/negative dividends and divisors
- Edge cases (zero, exact multiples, boundary values)
- Float special values (NaN, Infinity)
datafusion/spark/src/function/hash/sha1.rs (1)
27-27: LGTM!The import additions correctly support the assertion-based error handling pattern introduced below.
18674: To review by AI