Skip to content

18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark.#25

Open
martin-augment wants to merge 3 commits intomainfrom
pr-18674-2025-11-13-21-14-11
Open

18674: chore: Refactor with assert_or_internal_err!() in datafusion/spark.#25
martin-augment wants to merge 3 commits intomainfrom
pr-18674-2025-11-13-21-14-11

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

18674: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Nov 13, 2025

Walkthrough

This pull request refactors error handling across multiple Spark dialect functions in the datetime, hash, and math modules. The changes systematically replace explicit internal_err! calls with assertion-based macros (assert_eq_or_internal_err! and assert_or_internal_err!) for argument count and type validation. Updated imports include DataFusionError and the new assertion helpers from datafusion_common. The modifications affect argument validation paths in files including date_add.rs, date_sub.rs, last_day.rs, make_interval.rs, crc32.rs, sha1.rs, factorial.rs, hex.rs, and modulus.rs. Additionally, modulus.rs extends PMOD computation logic to handle non-negative remainder semantics. No public API signatures or core logic are altered.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-18674-2025-11-13-21-14-11

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

other,
ColumnarValue::Array(_)
| ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_))
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Match Arm Logic Inversion: Incorrect Control Flow

The assertion logic is inverted in the match arm. The old code unconditionally returned an error for any value in the other arm. The new assertion checks if other matches expected types and returns an error only if it doesn't match. However, if the assertion passes (value matches expected types), execution continues without producing a value for the match arm or returning from the function, which is incorrect. The assertion should likely check that other does NOT match the expected patterns, or this should revert to an unconditional error return.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The check seems wrong for two reasons: 1) ColumnarValue::Array(arr) is already covered earlier in the first arm of the match; 2) ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(_)) could match only for ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(None)) because the case for Some(iv) is also handled in an earlier arm. It would be cleaner if the None case was handled as the other match arms, and other always returns Err, as before.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

assert_eq_or_internal_err!(
args.len(),
2,
"Spark `date_add` function requires 2 arguments"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider preserving the actual argument count in the error message (e.g., include got {} with args.len()), as the previous version provided this detail; this regression also appears in similar checks (crc32, sha1, last_day).

🤖 Was this useful? React with 👍 or 👎

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct that useful debug information is lost in this change and it should be reverted. Prevents harder debugging when the error occurs.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
datafusion/spark/src/function/hash/sha1.rs (1)

119-126: Consider simplifying the error handling pattern.

The current pattern combines destructuring with an assertion that will always fail when the else block is entered, making it harder to understand. While functionally correct, consider one of these simpler alternatives:

Option 1 (assertion before destructuring):

-let [input] = args else {
-    assert_eq_or_internal_err!(
-        args.len(),
-        1,
-        "Spark `sha1` function requires 1 argument"
-    );
-    unreachable!()
-};
+assert_eq_or_internal_err!(
+    args.len(),
+    1,
+    "Spark `sha1` function requires 1 argument"
+);
+let input = &args[0];

Option 2 (if error return is preferred over panic):

 let [input] = args else {
-    assert_eq_or_internal_err!(
-        args.len(),
-        1,
-        "Spark `sha1` function requires 1 argument"
-    );
-    unreachable!()
+    return internal_err!(
+        "Spark `sha1` function requires 1 argument, got {}",
+        args.len()
+    );
 };

Note: If this pattern is intentional as part of the systematic refactoring across the codebase, feel free to disregard this suggestion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between baccbf0 and fcd703b.

📒 Files selected for processing (9)
  • datafusion/spark/src/function/datetime/date_add.rs (2 hunks)
  • datafusion/spark/src/function/datetime/date_sub.rs (3 hunks)
  • datafusion/spark/src/function/datetime/last_day.rs (2 hunks)
  • datafusion/spark/src/function/datetime/make_interval.rs (2 hunks)
  • datafusion/spark/src/function/hash/crc32.rs (2 hunks)
  • datafusion/spark/src/function/hash/sha1.rs (2 hunks)
  • datafusion/spark/src/function/math/factorial.rs (2 hunks)
  • datafusion/spark/src/function/math/hex.rs (2 hunks)
  • datafusion/spark/src/function/math/modulus.rs (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.

Applied to files:

  • datafusion/spark/src/function/datetime/date_sub.rs
  • datafusion/spark/src/function/math/factorial.rs
  • datafusion/spark/src/function/datetime/last_day.rs
  • datafusion/spark/src/function/datetime/date_add.rs
  • datafusion/spark/src/function/hash/sha1.rs
  • datafusion/spark/src/function/datetime/make_interval.rs
  • datafusion/spark/src/function/math/hex.rs
  • datafusion/spark/src/function/math/modulus.rs
  • datafusion/spark/src/function/hash/crc32.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
  • GitHub Check: Analyze (rust)
🔇 Additional comments (11)
datafusion/spark/src/function/math/hex.rs (1)

32-34: LGTM! Clean refactor to assertion-based validation.

The migration from manual error checks to assert_eq_or_internal_err! is consistent with the PR objective and maintains identical error semantics.

Also applies to: 188-188

datafusion/spark/src/function/math/factorial.rs (1)

25-27: LGTM! Consistent with PR-wide refactoring pattern.

The assertion-based argument validation is clean and consistent.

Also applies to: 104-104

datafusion/spark/src/function/hash/crc32.rs (1)

27-27: LGTM! Defensive pattern ensures type safety.

The else { assert + unreachable!() } pattern is correct: if destructuring fails, the assertion validates and returns an error, while unreachable!() satisfies the compiler's exhaustiveness requirement.

Also applies to: 108-113

datafusion/spark/src/function/datetime/date_add.rs (1)

28-30: LGTM! Consistent with codebase refactoring.

The argument count validation using assertions is clean and follows the established pattern.

Also applies to: 93-98

datafusion/spark/src/function/datetime/last_day.rs (1)

24-27: LGTM! Clean assertion-based validation.

The refactor maintains error semantics while standardizing the validation approach.

Also applies to: 71-76

datafusion/spark/src/function/datetime/make_interval.rs (1)

241-244: LGTM! Test assertions improve readability.

The refactored test validations using assert_eq_or_internal_err! and assert_or_internal_err! are more declarative and consistent with the broader codebase style.

Also applies to: 539-576

datafusion/spark/src/function/datetime/date_sub.rs (2)

28-30: LGTM! Standard assertion-based validation.

The argument count assertion follows the established PR pattern.

Also applies to: 87-92


133-138: LGTM! Defensive assertion in unreachable branch.

The assert_or_internal_err! in the default match arm validates that we should never reach this code path (since coerce_types should prevent invalid types). The assertion intentionally checks for valid types (which will fail in this branch), generating an internal error before unreachable!() satisfies the compiler. While this pattern might seem unusual, it's consistent with the PR's defensive programming approach.

datafusion/spark/src/function/math/modulus.rs (2)

21-23: LGTM! Consistent assertion-based validation.

The refactor to assert_eq_or_internal_err! for argument count checks is clean and maintains identical error semantics across both mod and pmod implementations.

Also applies to: 32-32, 88-92, 138-142


40-52: LGTM! Correct implementation of non-negative remainder semantics.

The enhanced pmod logic correctly implements positive modulo behavior:

  1. Computes standard remainder: result = left % right
  2. Identifies negative remainders: neg = (result < 0)
  3. Adjusts by adding divisor where negative: result = result + (neg ? right : 0)
  4. Normalizes with final modulo: result = result % right

Example: -7 pmod 3-1-1 + 3 = 22 % 3 = 2

The test coverage (lines 359-610) validates correctness for:

  • Positive/negative dividends and divisors
  • Edge cases (zero, exact multiples, boundary values)
  • Float special values (NaN, Infinity)
datafusion/spark/src/function/hash/sha1.rs (1)

27-27: LGTM!

The import additions correctly support the assertion-based error handling pattern introduced below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants