2815: fix: Fall back to Spark for MakeDecimal with unsupported input type by martin-augment · Pull Request #38 · martin-augment/datafusion-comet

martin-augment · 2025-12-10T06:52:47Z

2815: To review by AI

….scala Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

…to issue-2813

coderabbitai · 2025-12-10T06:53:07Z

Walkthrough

This pull request adds input type validation to the MakeDecimal expression across Rust and Scala implementations. The Rust implementation now validates that Array inputs have DataType::Int64, rejecting other types. The Scala implementation adds a getSupportLevel method to CometMakeDecimal that returns Compatible for LongType inputs and Unsupported for other types. Test utilities are updated across Spark versions 3.4, 3.5, and 4.0 with a createMakeDecimalColumn helper method. New test cases verify behavior with both integer and long inputs via the DataFrame API.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-2815-2025-12-10-06-52-43

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

claude · 2025-12-10T06:55:02Z

Code Review

Summary

This PR addresses issue apache#2813 by adding proper type checking for the MakeDecimal expression to ensure it only accepts LongType inputs, falling back to Spark for unsupported types like IntegerType. The fix is well-implemented with appropriate test coverage.

✅ Strengths

Proper Type Safety: The addition of getSupportLevel in CometMakeDecimal correctly validates input types before attempting conversion, preventing runtime errors.
Comprehensive Test Coverage: Two new tests cover both the fallback case (IntegerType) and the supported case (LongType), ensuring the feature works as expected.
Cross-Version Compatibility: The createMakeDecimalColumn helper is correctly implemented across all three Spark versions (3.4, 3.5, 4.0), accounting for API differences:
- Spark 3.4/3.5: MakeDecimal(child, precision, scale)
- Spark 4.0: MakeDecimal(child, precision, scale, true) with ExpressionColumnNode
Defensive Programming: The Rust code now explicitly checks array data types and returns meaningful error messages for unsupported types.

🔍 Observations

Rust Implementation (make_decimal.rs)

Line 43-57: The pattern matching on a.data_type() provides better type safety. The error message format is consistent with the scalar value case.
Code Quality: Clean refactoring that maintains consistency between scalar and array value handling.

Scala Serialization (decimalExpressions.scala)

Line 42-47: Good use of pattern matching for type validation. The error message clearly identifies the problematic input type.
Note: The getSupportLevel implementation follows the established pattern in the codebase (verified against similar implementations in datetime.scala, arrays.scala, etc.).

Test Implementation (CometExpressionSuite.scala)

Line 3120-3124: Good use of SQL configurations to prevent optimization that might bypass the bug.
Line 3130: The test correctly verifies the fallback reason matches the expected error message.
Line 3149: The supported case verifies the operation runs on Comet (not falling back to Spark).

💭 Potential Concerns & Questions

Spark 4.0 Constructor Difference (ShimCometTestBase.scala:52)
- The Spark 4.0 version uses MakeDecimal(child, precision, scale, true) with a fourth parameter.
- Question: What does the true parameter represent? Is it related to nullOnOverflow? If so, should this be configurable or tested with both values?
- Suggestion: Consider adding a code comment explaining the fourth parameter purpose for future maintainability.
Test Configuration Completeness
- The test disables adaptive execution and constant folding to expose the bug.
- Question: Are there other optimization paths that might still trigger this issue? Should we have additional test variants?
Error Message Consistency
- Rust error: "Expected Int64 but found {av:?}" (line 56)
- Scala error: "Unsupported input data type: $other" (line 45)
- These messages are slightly different in style. The Scala one is more user-friendly.
- Suggestion: Consider whether the Rust error message could be caught earlier via the Scala validation (which it is), making this less critical.
Missing Type Support
- The fix explicitly supports only LongType.
- Question: Are there other numeric types (e.g., IntegerType, ShortType, ByteType) that Spark MakeDecimal supports? If so, should Comet consider supporting them in the future, or is LongType the only valid internal representation?

🛡️ Security Considerations

✅ No security concerns identified
The type checking prevents potential type confusion vulnerabilities
Proper error handling prevents potential panics in the Rust code

⚡ Performance Considerations

✅ The getSupportLevel check happens at query planning time, not during execution, so there is no runtime performance impact
The Rust implementation remains efficient with early type validation
Fallback to Spark for unsupported types is the correct behavior

📋 Recommendations

Documentation: Add a comment in the Spark 4.0 shim explaining why MakeDecimal requires the fourth parameter (at line 52).
Consider Future Enhancement: Document whether other integer types could be supported by adding a cast to Long in the Scala layer before passing to Comet.
Test Naming: The test names are clear, but consider adding a comment referencing the original issue for historical context (you already have this - great!).

✨ Overall Assessment

Rating: Excellent ⭐⭐⭐⭐⭐

This is a well-crafted fix that:

Addresses the root cause properly
Includes comprehensive tests
Maintains cross-version compatibility
Follows established patterns in the codebase
Provides clear fallback behavior

The only minor improvements would be additional documentation around the Spark 4.0 API difference. This PR is ready to merge with those optional enhancements.

Reviewed by: Claude Code

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (3)
3115-3133: Verify the decimal precision and scale values.

The test creates a value of 123456 and attempts to convert it to decimal(3, 0), which can only hold values up to 999. With ANSI_ENABLED set to false, this would return null on overflow. However, since the test expects a fallback due to IntegerType, the overflow is never evaluated. Consider using precision/scale values that would accommodate the test value (e.g., decimal(10, 0)) to make the test intent clearer and avoid confusion.

Apply this diff to use appropriate precision/scale:
-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)
3135-3152: Verify the decimal precision and scale values.

Similar to the integer test, this test uses decimal(3, 0) for a value of 123456, which will overflow. While ANSI_ENABLED is set to false (allowing null on overflow), using a more appropriate precision like decimal(10, 0) would make the test clearer and better demonstrate the normal operator flow without relying on overflow behavior.

Apply this diff to use appropriate precision/scale:
-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)
3115-3152: Consider adding test coverage for NULL values.

The current tests don't verify NULL handling for MakeDecimal. Based on the learnings, NULL values for integers should be handled gracefully. Consider adding a test case that includes NULL input values to ensure the operator correctly returns NULL without panicking.

Add a test case:
test("make decimal using DataFrame API - with nulls") {
  withTable("t1") {
    sql("create table t1 using parquet as select case when id % 2 = 0 then null else cast(id as long) end as c1 from range(10)")
    
    withSQLConf(
      SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
      SQLConf.ANSI_ENABLED.key -> "false",
      SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
      
      val df = sql("select * from t1")
      val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)
      val df1 = df.withColumn("result", makeDecimalColumn)
      
      checkSparkAnswerAndOperator(df1)
    }
  }
}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9349c09 and 726ca21.

📒 Files selected for processing (6)

native/spark-expr/src/math_funcs/internal/make_decimal.rs (1 hunks)
spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1 hunks)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1 hunks)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (2 hunks)
spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (2 hunks)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

📚 Learning: 2025-11-04T14:26:48.750Z

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Applied to files:

native/spark-expr/src/math_funcs/internal/make_decimal.rs

🧬 Code graph analysis (5)

spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (2)

spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (50-52)

spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (51-53)

spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1)

spark/src/main/scala/org/apache/comet/serde/SupportLevel.scala (2)

Compatible (31-31)

Unsupported (42-42)

spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)

spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (51-53)

spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (2)

spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (50-52)

spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (50-52)

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (3)

spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (50-52)

spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (50-52)

spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)

createMakeDecimalColumn (51-53)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: claude-review

🔇 Additional comments (5)

native/spark-expr/src/math_funcs/internal/make_decimal.rs (1)

43-57: LGTM! Array type validation correctly implemented.

The added type checking for Array inputs properly validates that only DataType::Int64 is supported, matching the existing scalar validation. The implementation correctly uses append_option which handles NULL values appropriately based on the learnings about NULL handling for integers.

spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1)

42-47: LGTM! Support level check aligns with Rust implementation.

The getSupportLevel method correctly validates that only LongType inputs are compatible, which corresponds to DataType::Int64 in the Rust implementation. The error message provides clear feedback for unsupported types.

spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)

50-52: LGTM! Helper method correctly implemented for Spark 3.4.

The createMakeDecimalColumn helper provides a consistent API across Spark versions for creating MakeDecimal columns in tests. The implementation correctly uses the 2-parameter MakeDecimal constructor appropriate for Spark 3.4.

spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)

50-52: LGTM! Helper method consistent with Spark 3.4 implementation.

The implementation correctly uses the 2-parameter MakeDecimal constructor for Spark 3.5, maintaining API consistency across test utilities.

spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)

51-53: LGTM! Helper method correctly adapted for Spark 4.0.

The implementation properly uses the 3-parameter MakeDecimal constructor with the nullOnOverflow flag (set to true) and wraps it with ExpressionColumnNode.apply, which is appropriate for Spark 4.0's API while maintaining a consistent interface across test utilities.

coderabbitai · 2025-12-10T06:56:45Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

+      withSQLConf(
+        SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
+        SQLConf.ANSI_ENABLED.key -> "false",
+        SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false",
+        SQLConf.ADAPTIVE_OPTIMIZER_EXCLUDED_RULES.key -> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read the test code around lines 3120-3124 sed -n '3100,3150p' spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

Repository: martin-augment/datafusion-comet

Length of output: 2133

🏁 Script executed:

#!/bin/bash # Search for the test function name to understand the context better sed -n '3000,3150p' spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala | head -100

Repository: martin-augment/datafusion-comet

Length of output: 4224

🏁 Script executed:

#!/bin/bash # Search for MakeDecimal in the test file to see related patterns rg -n "MakeDecimal" spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala -B 5 -A 15

Repository: martin-augment/datafusion-comet

Length of output: 1466

🏁 Script executed:

#!/bin/bash # Search for similar test patterns using withSQLConf and these specific configs rg -n "ADAPTIVE_EXECUTION_ENABLED.*false" spark/src/test/scala/ -B 3 -A 3

Repository: martin-augment/datafusion-comet

Length of output: 8410

🏁 Script executed:

#!/bin/bash # Check if there are comments or documentation about why these configs are needed rg -n "ConstantFolding\|ADAPTIVE_EXECUTION_ENABLED" spark/src/test/scala/org/apache/comet/ -B 2 -A 2

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

#!/bin/bash # Search for other MakeDecimal tests and their configs rg -n "MakeDecimal\|makeDecimal" spark/src/test/scala/ -B 10 -A 5 | grep -A 10 -B 10 "withSQLConf"

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

#!/bin/bash # Look for any comments or documentation about ConstantFolding rg -n "ConstantFolding" spark/src/test/scala/org/apache/comet/ -B 3 -A 3

Repository: martin-augment/datafusion-comet

Length of output: 8481

🏁 Script executed:

#!/bin/bash # Check the createMakeDecimalColumn function to understand what it does rg -n "def createMakeDecimalColumn" spark/src/test/scala/ -A 10

Repository: martin-augment/datafusion-comet

Length of output: 57

🏁 Script executed:

#!/bin/bash # Look for the issue 2813 or any comments linking to it rg -n "2813" spark/src/test/scala/org/apache/comet/

Repository: martin-augment/datafusion-comet

Length of output: 204

Add comments explaining the SQL configuration settings for MakeDecimal tests.

These configurations are necessary for testing MakeDecimal with scalar values but lack documentation. Other similar tests in the file (e.g., lines 1437–1438) include explanatory comments:

ADAPTIVE_OPTIMIZER_EXCLUDED_RULES with ConstantFolding: Prevents Spark from pre-computing literal expressions, allowing actual MakeDecimal execution to be tested

ADAPTIVE_EXECUTION_ENABLED -> "false": Disables adaptive query execution to avoid plan restructuring that could interfere with testing

Add similar explanatory comments to lines 3120–3124 and 3139–3143 (the "long" variant test).

🤖 Prompt for AI Agents

In spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala around lines 3120–3124 and also update the analogous block at lines 3139–3143, add inline comments explaining each SQLConf used for the MakeDecimal scalar tests: note that ADAPTIVE_OPTIMIZER_EXCLUDED_RULES set to ConstantFolding prevents Spark from pre-computing literal expressions so MakeDecimal is executed, ADAPTIVE_EXECUTION_ENABLED -> "false" disables adaptive query execution to avoid plan restructuring, ANSI_ENABLED -> "false" and USE_V1_SOURCE_LIST -> "parquet" are set for deterministic behavior with parquet literals; place these brief explanatory comments immediately above or beside the withSQLConf block in both locations.

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! For easier maintenance in the future it will be good to add a short comment next to each setting to explain why it is needed for this particular test. Prevents wasted time debugging such test in the future.

martin-augment · 2025-12-10T07:31:03Z

3115-3133: Verify the decimal precision and scale values.

The test creates a value of 123456 and attempts to convert it to decimal(3, 0), which can only hold values up to 999. With ANSI_ENABLED set to false, this would return null on overflow. However, since the test expects a fallback due to IntegerType, the overflow is never evaluated. Consider using precision/scale values that would accommodate the test value (e.g., decimal(10, 0)) to make the test intent clearer and avoid confusion.

Apply this diff to use appropriate precision/scale:
-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The purpose of the test is to verify that only Long/Int64 type could be used to create a decimal. Using a precision that leads to an overflow add a side effect to the test that is not needed here. Prevents confusion in the developer maintaining the codebase.

martin-augment · 2025-12-10T07:31:20Z

3135-3152: Verify the decimal precision and scale values.

Similar to the integer test, this test uses decimal(3, 0) for a value of 123456, which will overflow. While ANSI_ENABLED is set to false (allowing null on overflow), using a more appropriate precision like decimal(10, 0) would make the test clearer and better demonstrate the normal operator flow without relying on overflow behavior.

Apply this diff to use appropriate precision/scale:
-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The purpose of the test is to verify that only Long/Int64 type could be used to create a decimal. Using a precision that leads to an overflow add a side effect to the test that is not needed here. Prevents confusion in the developer maintaining the codebase.

github-actions · 2026-03-03T03:53:52Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

andygrove and others added 12 commits November 21, 2025 16:27

fix

75b70f3

remove unused import

7803ed5

test

a71832d

scalastyle

e4df8cb

shims

c8394a3

scalastyle

3163875

Merge remote-tracking branch 'apache/main' into issue-2813

aa9c026

Update spark/src/main/scala/org/apache/comet/serde/decimalExpressions…

d41366f

….scala Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

add test for long and remove some configs

e7e67f6

Merge branch 'issue-2813' of github.com:andygrove/datafusion-comet in…

64ea29e

…to issue-2813

add test for long and remove some configs

ccae240

scalastyle

726ca21

augmentcode bot reviewed Dec 10, 2025

View reviewed changes

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

github-actions bot added the Stale label Mar 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2815: fix: Fall back to Spark for MakeDecimal with unsupported input type#38

2815: fix: Fall back to Spark for MakeDecimal with unsupported input type#38
martin-augment wants to merge 12 commits intomainfrom
pr-2815-2025-12-10-06-52-43

martin-augment commented Dec 10, 2025

Uh oh!

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

augmentcode bot left a comment

Uh oh!

claude bot commented Dec 10, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 10, 2025

Uh oh!

martin-augment Dec 10, 2025

Uh oh!

martin-augment commented Dec 10, 2025

Uh oh!

martin-augment commented Dec 10, 2025

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Dec 10, 2025

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude bot commented Dec 10, 2025

Code Review

Summary

✅ Strengths

🔍 Observations

Rust Implementation (make_decimal.rs)

Scala Serialization (decimalExpressions.scala)

Test Implementation (CometExpressionSuite.scala)

💭 Potential Concerns & Questions

🛡️ Security Considerations

⚡ Performance Considerations

📋 Recommendations

✨ Overall Assessment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Dec 10, 2025

Uh oh!

martin-augment commented Dec 10, 2025

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Dec 10, 2025 •

edited

Loading