2826: feat: Support ANSI mode SUM (Decimal types) by martin-augment · Pull Request #33 · martin-augment/datafusion-comet

martin-augment · 2025-12-04T17:40:23Z

2826: To review by AI

coderabbitai · 2025-12-04T17:40:39Z

Walkthrough

The changes introduce EvalMode support to the SUM aggregation function for Decimal128 types. The protobuf Sum message field is replaced from a boolean fail_on_error to an EvalMode enum field. The SumDecimal UDF is updated to accept and store an eval_mode parameter throughout its API, including constructors, accumulators, and aggregator implementations. Overflow handling logic is refactored to respect the EvalMode: in ANSI mode, overflows raise errors; in non-ANSI modes, they set the sum to None. The Scala serialization layer is updated to map Spark's EvalMode to protobuf representation. New test cases are added to verify decimal SUM and try_sum behavior under ANSI and non-ANSI modes with null values, group-by operations, and overflow scenarios.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-2826-2025-12-04-17-40-21

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-12-04T17:42:54Z

Code Review for PR #33: ANSI Mode Support for Decimal SUM

Summary

This PR adds ANSI mode support to the SUM aggregate function for decimal types, implementing proper overflow handling that either throws errors (ANSI/TRY modes) or returns NULL (Legacy mode).

✅ Strengths

Code Quality

Clean refactoring: The transition from tracking nullability with boolean buffers to using Option<i128> is elegant and more idiomatic Rust
Consistent implementation: Both Accumulator and GroupsAccumulator implementations follow the same logic
Good documentation: Inline comments explain Spark compatibility decisions (e.g., shouldTrackIsEmpty behavior)

Test Coverage

Comprehensive testing: Tests cover null handling, overflow scenarios, GROUP BY operations, and try_sum variants
Both modes tested: ANSI and Legacy modes are properly validated
Edge cases: Partial overflow scenarios with GROUP BY are tested (line 1623)

Architecture

Proper separation of concerns: EvalMode is threaded through all layers (Scala → Proto → Rust)
Maintains compatibility: State serialization format is consistent (always 2 arrays)

🔍 Issues Found

1. ⚠️ Critical: Missing TRY Mode Logic

Location: native/spark-expr/src/agg_funcs/sum_decimal.rs:166, 294, 344, 505

The implementation only checks for EvalMode::Ansi but doesn't handle EvalMode::Try. According to the Scala changes, TRY mode should return NULL on overflow (not throw), but your current code will fall through to the Legacy path.

// Current code (line 166)
if self.eval_mode == EvalMode::Ansi {
    return Err(DataFusionError::from(arithmetic_overflow_error("decimal")));
}

Recommendation: Update all overflow checks to:

if self.eval_mode == EvalMode::Ansi {
    return Err(DataFusionError::from(arithmetic_overflow_error("decimal")));
}
// TRY mode: continue with NULL assignment (existing behavior)

Or add an explicit comment if TRY and Legacy are intentionally identical:

if self.eval_mode == EvalMode::Ansi {
    return Err(DataFusionError::from(arithmetic_overflow_error("decimal")));
}
// TRY and Legacy modes: return NULL on overflow

2. ⚠️ Minor: Inconsistent Overflow State Management

Location: native/spark-expr/src/agg_funcs/sum_decimal.rs:189-192

The early return check in update_batch prevents processing values after overflow:

if !self.is_empty && self.sum.is_none() {
    return Ok(());
}

However, this bypasses the logic that updates is_empty based on incoming values. While functionally correct (overflow state is permanent), it's slightly inconsistent with the rest of the flow.

Impact: Low - The behavior is correct, but the logic flow could be clearer.

3. 📝 Documentation: Proto Field Comment Outdated

Location: native/proto/src/proto/expr.proto:123

The proto field changed from bool fail_on_error to EvalMode eval_mode, but there's no inline documentation explaining the new field or what the enum values mean.

Recommendation: Add a comment:

message Sum {
  Expr child = 1;
  DataType datatype = 2;
  EvalMode eval_mode = 3; // Controls overflow behavior: Legacy (NULL), Ansi (error), Try (NULL)
}

4. 🔧 Code Quality: Magic Number in Tests

Location: spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:1554-1556

val maxDec38_0 = new java.math.BigDecimal("99999999999999999999")
(1 to 50).flatMap(_ => Seq((maxDec38_0, 1)))

The decimal value and count (50) are magic numbers without explanation. Why this specific value? How does it guarantee overflow?

Recommendation: Add a comment explaining the overflow calculation or use named constants.

🚀 Performance Considerations

Positive

Optimal memory layout: Vec<Option<i128>> is more cache-friendly than separate vectors for values and null flags
Early exit on overflow: The check at the beginning of update functions prevents unnecessary computation

Consideration

Memory overhead: Option<i128> uses 24 bytes (16 for i128 + 8 for discriminant padding) vs 17 bytes for separate tracking. For large GROUP BY cardinalities, this is a ~41% increase per group.
- Verdict: This is acceptable given the code clarity benefits and Rust's optimization capabilities

🔒 Security Concerns

✅ No security issues identified

Overflow handling is proper and prevents undefined behavior
No unsafe code added beyond existing value_unchecked (which is appropriate given bounds checks)
No injection risks or unsafe memory access

📊 Test Coverage Assessment

Well Covered

✅ Null handling (aggregates and group by)
✅ Overflow with ANSI mode
✅ Overflow with Legacy mode
✅ try_sum variants
✅ Partial overflow scenarios

Potentially Missing

⚠️ Merge overflow scenarios: Tests don't explicitly verify overflow during the merge phase of distributed aggregation
⚠️ State serialization: No explicit test for state() and merge_batch() round-tripping with overflow states
ℹ️ Edge case: Sum of empty groups is tested, but not mixed empty/overflow states in the same execution

Recommendation: Add a test that forces a multi-stage aggregation with overflow in partial phase.

📋 Minor Observations

Line 565-568: The benchmark still uses EvalMode::Legacy - consider adding benchmarks for ANSI mode to measure error path performance
Removed imports: Good cleanup removing unused BooleanBufferBuilder and BitAnd imports
Consistent naming: State field names ("sum", "is_empty") are well chosen and match Spark conventions

Final Verdict

Overall: Strong implementation with one critical fix needed

This is a well-architected change that properly extends decimal sum to support ANSI mode. The code is clean, well-tested, and maintains backward compatibility. The main issue is the missing TRY mode logic distinction, which should be addressed before merging.

Action Items Before Merge

✅ Must Fix: Clarify TRY mode vs Legacy mode handling in overflow checks
📝 Should Fix: Add proto documentation for eval_mode field
🔍 Consider: Add merge phase overflow test
📝 Nice to Have: Document magic numbers in test data generation

Great work on this feature! The implementation demonstrates strong understanding of both Spark semantics and DataFusion's architecture.

🤖 Generated with Claude Code

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala (1)

27-32: New decimal SUM / try_sum ANSI & overflow tests provide strong coverage

The added tests nicely cover:

NULL behaviour for sum and try_sum on decimals, with and without GROUP BY, under both ANSI on/off.

Decimal overflow semantics for SUM in ANSI vs non‑ANSI (including grouped and repartitioned cases), asserting both Spark and Comet throw ARITHMETIC_OVERFLOW in ANSI mode.

Decimal overflow handling for try_sum, including a partial‑overflow GROUP BY where one group overflows and another succeeds.

Correct wiring via CometConf.getExprAllowIncompatConfigKey(classOf[Sum]) and use of Sum/col imports.

If you want to make the last “partial overflow” test even more self‑documenting, you could also assert the concrete result (e.g., group 1 → NULL, group 2 → 500) instead of only relying on checkSparkAnswerAndOperator, but the current tests are already effective.

Also applies to: 1475-1635

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between caacad6 and ab437f5.

📒 Files selected for processing (6)

native/core/src/execution/planner.rs (1 hunks)
native/proto/src/proto/expr.proto (1 hunks)
native/spark-expr/benches/aggregate.rs (2 hunks)
native/spark-expr/src/agg_funcs/sum_decimal.rs (12 hunks)
spark/src/main/scala/org/apache/comet/serde/aggregates.scala (3 hunks)
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-04T14:26:48.750Z

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Applied to files:

native/spark-expr/src/agg_funcs/sum_decimal.rs

🧬 Code graph analysis (4)

native/spark-expr/benches/aggregate.rs (1)

native/spark-expr/src/agg_funcs/sum_decimal.rs (1)

try_new (47-63)

native/spark-expr/src/agg_funcs/sum_decimal.rs (2)

native/spark-expr/src/utils.rs (1)

is_valid_decimal_precision (225-229)

native/spark-expr/src/lib.rs (1)

arithmetic_overflow_error (110-114)

spark/src/main/scala/org/apache/comet/serde/aggregates.scala (3)

spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala (8)

comet (79-81)

comet (86-123)

comet (125-127)

comet (129-138)

comet (152-154)

comet (156-158)

comet (160-166)

comet (168-174)

spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala (2)

QueryPlanSerde (45-717)

evalModeToProto (439-446)

spark/src/main/scala/org/apache/comet/serde/SupportLevel.scala (1)

Incompatible (39-39)

spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala (1)

spark/src/main/scala/org/apache/spark/sql/comet/operators.scala (2)

CometHashAggregateExec (1104-1130)

CometHashAggregateExec (1168-1225)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: claude-review

🔇 Additional comments (16)

native/proto/src/proto/expr.proto (1)

120-124: Sum.eval_mode proto change: check versioning/compat expectations

Switching field 3 from bool fail_on_error to EvalMode eval_mode keeps the same field number and varint wire type, but changes the semantic meaning of existing encoded values (e.g., old true now decoding as enum value 1). This is fine if plans are never persisted across mixed Comet versions, but if there is any cross-version interchange, you may need to document that Sum is not backward‑compatible across this change or gate it by version.

native/core/src/execution/planner.rs (1)

1867-1893: Decimal SUM now wired to EvalMode-aware UDF correctly

For decimal inputs you now: (1) decode expr.eval_mode via from_protobuf_eval_mode, and (2) construct SumDecimal::try_new(datatype, eval_mode) and wrap it in an AggregateUDF. This cleanly propagates LEGACY/TRY/ANSI into the decimal SUM implementation while leaving the non-decimal path unchanged and still backed by sum_udaf. Given CometSum.getSupportLevel only allows ANSI/TRY for decimal types, this looks consistent and safe.

native/spark-expr/benches/aggregate.rs (1)

35-36: Benchmark update to new SumDecimal API looks good

Importing EvalMode and constructing SumDecimal with EvalMode::Legacy aligns the benchmark with the new try_new(data_type, eval_mode) signature without changing the workload semantics. No issues here.

Also applies to: 98-101

spark/src/main/scala/org/apache/comet/serde/aggregates.scala (1)

32-34: Sum serde changes correctly gate ANSI/TRY and propagate EvalMode

CometSum.getSupportLevel now treats ANSI/TRY as incompatible only when sum.dataType is non‑decimal, which matches the native support matrix and ensures non-decimal SUM in ANSI/TRY still falls back to Spark.

In convert, wiring builder.setEvalMode(evalModeToProto(CometEvalModeUtil.fromSparkEvalMode(sum.evalMode))) means the Rust side sees the exact eval mode for decimal SUM, consistent with how scalar/math casts already behave.
Overall this is a clean extension of EvalMode into the SUM aggregate path.

Also applies to: 216-225, 242-247

native/spark-expr/src/agg_funcs/sum_decimal.rs (12)

18-19: LGTM: EvalMode integration in struct definition.

The addition of the eval_mode field and necessary imports properly extends the SumDecimal struct to support ANSI mode semantics.

Also applies to: 43-43

47-63: LGTM: Constructor properly parameterized with EvalMode.

The try_new method correctly accepts and stores the eval_mode parameter, enabling mode-specific behavior downstream.

71-86: LGTM: Consistent state field semantics.

The implementation correctly propagates eval_mode to accumulators and consistently tracks both sum and is_empty state fields to match Spark's behavior. The comments clarify this design decision effectively.

133-153: LGTM: Accumulator initialization is correct.

The accumulator properly initializes with sum = Some(0) and is_empty = true, which correctly represents an empty aggregation state. The eval_mode parameter is stored for overflow handling.

155-177: LGTM: Overflow handling is correctly implemented.

The overflow detection and handling logic is sound:

Early return for already-overflowed state prevents further processing

unwrap_or(0) at line 162 is safe due to prior overflow check at line 157

ANSI mode correctly raises errors on overflow

Non-ANSI mode correctly sets sum to None while marking is_empty as false

181-211: LGTM: Batch update correctly handles nulls and overflow.

The batch update logic correctly:

Detects and skips already-overflowed states

Updates is_empty status based on null counts

Skips null values during iteration (lines 205-206)

Delegates to update_single for per-value processing and error propagation

213-230: LGTM: Evaluation correctly distinguishes empty vs. overflow states.

The evaluation logic properly handles:

Empty aggregations (no non-null values) → None

Overflowed sums (non-ANSI mode) → None

Valid sums with precision checks → decimal value

The defensive precision check at line 221 is good practice even though it shouldn't fail during normal operation.

236-306: LGTM: State serialization and merging are correct.

The state and merge logic correctly handles:

Serialization of both sum and is_empty state fields

Detection and propagation of overflow states (line 270-277)

Safe unwrapping at lines 289-290 due to prior validation

EvalMode-aware overflow handling during merge

Proper short-circuiting for empty and overflowed states

308-353: LGTM: Groups accumulator initialization and per-group updates are correct.

The groups accumulator properly:

Maintains per-group state vectors for sum and is_empty

Initializes new groups to Some(0) and true in resize_helper

Implements overflow detection and EvalMode-aware handling per group

Safe use of unwrap_or(0) at line 340 due to prior overflow check at line 336

388-449: LGTM: Group evaluation and state extraction are correct.

The evaluation logic correctly:

Handles both EmitTo::All and EmitTo::First(n) cases consistently

Checks is_empty and precision before emitting values

Returns None for empty groups or overflowed sums

Serializes per-group state as two arrays (sum, is_empty)

451-517: LGTM: Groups merge batch correctly handles per-group state merging.

The merge logic correctly:

Processes each group independently with proper state extraction

Detects and propagates overflow states (lines 480-488)

Safely unwraps at lines 500-501 after validation

Applies EvalMode-aware overflow handling during merge (lines 504-513)

Handles empty state short-circuiting (lines 490-497)

544-595: Tests updated correctly for new API.

The existing tests are properly updated to pass EvalMode::Legacy to the constructors. While these Rust unit tests don't cover overflow or ANSI mode behavior, the AI summary indicates that comprehensive edge case testing (overflow, ANSI/non-ANSI modes, nulls) is provided in the Scala test suite (CometAggregateSuite.scala).

martin-augment · 2025-12-04T18:02:14Z

. ⚠️ Critical: Missing TRY Mode Logic

Location: native/spark-expr/src/agg_funcs/sum_decimal.rs:166, 294, 344, 505

The implementation only checks for EvalMode::Ansi but doesn't handle EvalMode::Try. According to the Scala changes, TRY mode should return NULL on overflow (not throw), but your current code will fall through to the Legacy path.
// Current code (line 166)
if self.eval_mode == EvalMode::Ansi {
    return Err(DataFusionError::from(arithmetic_overflow_error("decimal")));
}
Recommendation: Update all overflow checks to:
if self.eval_mode == EvalMode::Ansi {
    return Err(DataFusionError::from(arithmetic_overflow_error("decimal")));
}
// TRY mode: continue with NULL assignment (existing behavior)
Or add an explicit comment if TRY and Legacy are intentionally identical:
if self.eval_mode == EvalMode::Ansi {
    return Err(DataFusionError::from(arithmetic_overflow_error("decimal")));
}
// TRY and Legacy modes: return NULL on overflow

value:incorrect-but-reasonable; category:bug; feedback:The Claude AI reviewer is not correct! The implementation is ANSI vs non-ANSI, i.e. TRY and LEGACY and treated the same since both of them should return NULL (i.e. None) when the sum overflows. Only ANSI mode should return an error.

martin-augment · 2025-12-04T18:05:52Z

📝 Documentation: Proto Field Comment Outdated

Location: native/proto/src/proto/expr.proto:123

The proto field changed from bool fail_on_error to EvalMode eval_mode, but there's no inline documentation explaining the new field or what the enum values mean.

Recommendation: Add a comment:
message Sum {
  Expr child = 1;
  DataType datatype = 2;
  EvalMode eval_mode = 3; // Controls overflow behavior: Legacy (NULL), Ansi (error), Try (NULL)
}

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! Having a short comment about the possible values and their behavior would be helpful for the maintenance of the protobuf file. It would prevent consulting with the rest of the code to see the possible values.

martin-augment · 2025-12-04T18:10:56Z

🔧 Code Quality: Magic Number in Tests

Location: spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:1554-1556
val maxDec38_0 = new java.math.BigDecimal("99999999999999999999")
(1 to 50).flatMap(_ => Seq((maxDec38_0, 1)))
The decimal value and count (50) are magic numbers without explanation. Why this specific value? How does it guarantee overflow?

Recommendation: Add a comment explaining the overflow calculation or use named constants.

value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct! A javadoc for the method would be helpful. The method generates 50 tuples of 99999999999999999999 and 1. Those are used as values for 50 rows used in the test cases. Each test case uses a SQL query like "SELECT SUM(col(1)) from table" and here the summing leads to overflow.

github-actions · 2026-03-03T03:53:59Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

coderfender added 8 commits November 25, 2025 12:18

support_ansi_sum_decimal_input

5227115

support_ansi_sum_decimal_input

dbe1e4c

support_ansi_sum_decimal_input

28c260e

support_ansi_sum_decimal_input

5df4821

support_ansi_sum_decimal_input

272b661

support_ansi_sum_decimal_input_fix_scala

ac6b493

support_ansi_sum_decimal_input_fix_plan_failure_tests

25aeb33

support_ansi_sum_decimal_input_fix_plan_failure_tests

ab437f5

augmentcode bot reviewed Dec 4, 2025

View reviewed changes

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

github-actions bot added the Stale label Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2826: feat: Support ANSI mode SUM (Decimal types)#33

2826: feat: Support ANSI mode SUM (Decimal types)#33
martin-augment wants to merge 8 commits intomainfrom
pr-2826-2025-12-04-17-40-21

martin-augment commented Dec 4, 2025

Uh oh!

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

claude bot commented Dec 4, 2025

Uh oh!

augmentcode bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

martin-augment commented Dec 4, 2025 •

edited

Loading

. ⚠️ Critical: Missing TRY Mode Logic

Uh oh!

martin-augment commented Dec 4, 2025

📝 Documentation: Proto Field Comment Outdated

Uh oh!

martin-augment commented Dec 4, 2025

🔧 Code Quality: Magic Number in Tests

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Dec 4, 2025

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Dec 4, 2025

Code Review for PR #33: ANSI Mode Support for Decimal SUM

Summary

✅ Strengths

Code Quality

Test Coverage

Architecture

🔍 Issues Found

1. ⚠️ Critical: Missing TRY Mode Logic

2. ⚠️ Minor: Inconsistent Overflow State Management

3. 📝 Documentation: Proto Field Comment Outdated

4. 🔧 Code Quality: Magic Number in Tests

🚀 Performance Considerations

Positive

Consideration

🔒 Security Concerns

📊 Test Coverage Assessment

Well Covered

Potentially Missing

📋 Minor Observations

Final Verdict

Action Items Before Merge

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

. ⚠️ Critical: Missing TRY Mode Logic

Uh oh!

martin-augment commented Dec 4, 2025

📝 Documentation: Proto Field Comment Outdated

Uh oh!

martin-augment commented Dec 4, 2025

🔧 Code Quality: Magic Number in Tests

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

martin-augment commented Dec 4, 2025 •

edited

Loading