18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark. by martin-augment · Pull Request #25 · martin-augment/datafusion

martin-augment · 2025-11-13T21:14:15Z

18674: To review by AI

coderabbitai · 2025-11-13T21:14:47Z

Walkthrough

This pull request refactors error handling across multiple Spark dialect functions in the datetime, hash, and math modules. The changes systematically replace explicit internal_err! calls with assertion-based macros (assert_eq_or_internal_err! and assert_or_internal_err!) for argument count and type validation. Updated imports include DataFusionError and the new assertion helpers from datafusion_common. The modifications affect argument validation paths in files including date_add.rs, date_sub.rs, last_day.rs, make_interval.rs, crc32.rs, sha1.rs, factorial.rs, hex.rs, and modulus.rs. Additionally, modulus.rs extends PMOD computation logic to handle non-negative remainder semantics. No public API signatures or core logic are altered.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-18674-2025-11-13-21-14-11

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor · 2025-11-13T21:16:17Z

+                        other,
+                        ColumnarValue::Array(_)
+                            | ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_))
+                    ),


Bug: Match Arm Logic Inversion: Incorrect Control Flow

The assertion logic is inverted in the match arm. The old code unconditionally returned an error for any value in the other arm. The new assertion checks if other matches expected types and returns an error only if it doesn't match. However, if the assertion passes (value matches expected types), execution continues without producing a value for the match arm or returning from the function, which is incorrect. The assertion should likely check that other does NOT match the expected patterns, or this should revert to an unconditional error return.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The check seems wrong for two reasons: 1) ColumnarValue::Array(arr) is already covered earlier in the first arm of the match; 2) ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_)) could match only for ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(None)) because the case for Some(iv) is also handled in an earlier arm. It would be cleaner if the None case was handled as the other match arms, and other always returns Err, as before.

augmentcode

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2025-11-13T21:17:35Z

+        assert_eq_or_internal_err!(
+            args.len(),
+            2,
+            "Spark `date_add` function requires 2 arguments"


Consider preserving the actual argument count in the error message (e.g., include got {} with args.len()), as the previous version provided this detail; this regression also appears in similar checks (crc32, sha1, last_day).

_{🤖 Was this useful? React with 👍 or 👎}

value:useful; category:bug; feedback:The Augment AI reviewer is correct that useful debug information is lost in this change and it should be reverted. Prevents harder debugging when the error occurs.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

datafusion/spark/src/function/hash/sha1.rs (1)
119-126: Consider simplifying the error handling pattern.

The current pattern combines destructuring with an assertion that will always fail when the else block is entered, making it harder to understand. While functionally correct, consider one of these simpler alternatives:

Option 1 (assertion before destructuring):
-let [input] = args else {
-    assert_eq_or_internal_err!(
-        args.len(),
-        1,
-        "Spark `sha1` function requires 1 argument"
-    );
-    unreachable!()
-};
+assert_eq_or_internal_err!(
+    args.len(),
+    1,
+    "Spark `sha1` function requires 1 argument"
+);
+let input = &args[0];
Option 2 (if error return is preferred over panic):
 let [input] = args else {
-    assert_eq_or_internal_err!(
-        args.len(),
-        1,
-        "Spark `sha1` function requires 1 argument"
-    );
-    unreachable!()
+    return internal_err!(
+        "Spark `sha1` function requires 1 argument, got {}",
+        args.len()
+    );
 };
Note: If this pattern is intentional as part of the systematic refactoring across the codebase, feel free to disregard this suggestion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between baccbf0 and fcd703b.

📒 Files selected for processing (9)

datafusion/spark/src/function/datetime/date_add.rs (2 hunks)
datafusion/spark/src/function/datetime/date_sub.rs (3 hunks)
datafusion/spark/src/function/datetime/last_day.rs (2 hunks)
datafusion/spark/src/function/datetime/make_interval.rs (2 hunks)
datafusion/spark/src/function/hash/crc32.rs (2 hunks)
datafusion/spark/src/function/hash/sha1.rs (2 hunks)
datafusion/spark/src/function/math/factorial.rs (2 hunks)
datafusion/spark/src/function/math/hex.rs (2 hunks)
datafusion/spark/src/function/math/modulus.rs (5 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.

📚 Learning: 2025-11-06T13:37:46.028Z

Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.

Applied to files:

datafusion/spark/src/function/datetime/date_sub.rs
datafusion/spark/src/function/math/factorial.rs
datafusion/spark/src/function/datetime/last_day.rs
datafusion/spark/src/function/datetime/date_add.rs
datafusion/spark/src/function/hash/sha1.rs
datafusion/spark/src/function/datetime/make_interval.rs
datafusion/spark/src/function/math/hex.rs
datafusion/spark/src/function/math/modulus.rs
datafusion/spark/src/function/hash/crc32.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Cursor Bugbot
GitHub Check: claude-review
GitHub Check: Analyze (rust)

🔇 Additional comments (11)

datafusion/spark/src/function/math/hex.rs (1)

32-34: LGTM! Clean refactor to assertion-based validation.

The migration from manual error checks to assert_eq_or_internal_err! is consistent with the PR objective and maintains identical error semantics.

Also applies to: 188-188

datafusion/spark/src/function/math/factorial.rs (1)

25-27: LGTM! Consistent with PR-wide refactoring pattern.

The assertion-based argument validation is clean and consistent.

Also applies to: 104-104

datafusion/spark/src/function/hash/crc32.rs (1)

27-27: LGTM! Defensive pattern ensures type safety.

The else { assert + unreachable!() } pattern is correct: if destructuring fails, the assertion validates and returns an error, while unreachable!() satisfies the compiler's exhaustiveness requirement.

Also applies to: 108-113

datafusion/spark/src/function/datetime/date_add.rs (1)

28-30: LGTM! Consistent with codebase refactoring.

The argument count validation using assertions is clean and follows the established pattern.

Also applies to: 93-98

datafusion/spark/src/function/datetime/last_day.rs (1)

24-27: LGTM! Clean assertion-based validation.

The refactor maintains error semantics while standardizing the validation approach.

Also applies to: 71-76

datafusion/spark/src/function/datetime/make_interval.rs (1)

241-244: LGTM! Test assertions improve readability.

The refactored test validations using assert_eq_or_internal_err! and assert_or_internal_err! are more declarative and consistent with the broader codebase style.

Also applies to: 539-576

datafusion/spark/src/function/datetime/date_sub.rs (2)

28-30: LGTM! Standard assertion-based validation.

The argument count assertion follows the established PR pattern.

Also applies to: 87-92

133-138: LGTM! Defensive assertion in unreachable branch.

The assert_or_internal_err! in the default match arm validates that we should never reach this code path (since coerce_types should prevent invalid types). The assertion intentionally checks for valid types (which will fail in this branch), generating an internal error before unreachable!() satisfies the compiler. While this pattern might seem unusual, it's consistent with the PR's defensive programming approach.

datafusion/spark/src/function/math/modulus.rs (2)

21-23: LGTM! Consistent assertion-based validation.

The refactor to assert_eq_or_internal_err! for argument count checks is clean and maintains identical error semantics across both mod and pmod implementations.

Also applies to: 32-32, 88-92, 138-142

40-52: LGTM! Correct implementation of non-negative remainder semantics.

The enhanced pmod logic correctly implements positive modulo behavior:

Computes standard remainder: result = left % right

Identifies negative remainders: neg = (result < 0)

Adjusts by adding divisor where negative: result = result + (neg ? right : 0)

Normalizes with final modulo: result = result % right

Example: -7 pmod 3 → -1 → -1 + 3 = 2 → 2 % 3 = 2 ✓

The test coverage (lines 359-610) validates correctness for:

Positive/negative dividends and divisors

Edge cases (zero, exact multiples, boundary values)

Float special values (NaN, Infinity)

datafusion/spark/src/function/hash/sha1.rs (1)

27-27: LGTM!

The import additions correctly support the assertion-based error handling pattern introduced below.

codetyri0n added 3 commits November 13, 2025 21:10

chore: Refactor with assert_or_internal_err!() in datafusion/spark.

18440ea

refactor cont...

790f1d0

Clippy fix

fcd703b

cursor bot reviewed Nov 13, 2025

View reviewed changes

augmentcode bot reviewed Nov 13, 2025

View reviewed changes

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark.#25

18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark.#25
martin-augment wants to merge 3 commits intomainfrom
pr-18674-2025-11-13-21-14-11

martin-augment commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

cursor bot Nov 13, 2025

Uh oh!

martin-augment Nov 13, 2025

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Nov 13, 2025

Uh oh!

martin-augment Nov 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Nov 13, 2025

Uh oh!

coderabbitai bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

cursor bot Nov 13, 2025

Choose a reason for hiding this comment

Bug: Match Arm Logic Inversion: Incorrect Control Flow

Uh oh!

martin-augment Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 13, 2025 •

edited

Loading