Skip to content

2815: fix: Fall back to Spark for MakeDecimal with unsupported input type#38

Open
martin-augment wants to merge 12 commits intomainfrom
pr-2815-2025-12-10-06-52-43
Open

2815: fix: Fall back to Spark for MakeDecimal with unsupported input type#38
martin-augment wants to merge 12 commits intomainfrom
pr-2815-2025-12-10-06-52-43

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

2815: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 10, 2025

Walkthrough

This pull request adds input type validation to the MakeDecimal expression across Rust and Scala implementations. The Rust implementation now validates that Array inputs have DataType::Int64, rejecting other types. The Scala implementation adds a getSupportLevel method to CometMakeDecimal that returns Compatible for LongType inputs and Unsupported for other types. Test utilities are updated across Spark versions 3.4, 3.5, and 4.0 with a createMakeDecimalColumn helper method. New test cases verify behavior with both integer and long inputs via the DataFrame API.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-2815-2025-12-10-06-52-43

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@claude
Copy link
Copy Markdown

claude bot commented Dec 10, 2025

Code Review

Summary

This PR addresses issue apache#2813 by adding proper type checking for the MakeDecimal expression to ensure it only accepts LongType inputs, falling back to Spark for unsupported types like IntegerType. The fix is well-implemented with appropriate test coverage.


✅ Strengths

  1. Proper Type Safety: The addition of getSupportLevel in CometMakeDecimal correctly validates input types before attempting conversion, preventing runtime errors.

  2. Comprehensive Test Coverage: Two new tests cover both the fallback case (IntegerType) and the supported case (LongType), ensuring the feature works as expected.

  3. Cross-Version Compatibility: The createMakeDecimalColumn helper is correctly implemented across all three Spark versions (3.4, 3.5, 4.0), accounting for API differences:

    • Spark 3.4/3.5: MakeDecimal(child, precision, scale)
    • Spark 4.0: MakeDecimal(child, precision, scale, true) with ExpressionColumnNode
  4. Defensive Programming: The Rust code now explicitly checks array data types and returns meaningful error messages for unsupported types.


🔍 Observations

Rust Implementation (make_decimal.rs)

  • Line 43-57: The pattern matching on a.data_type() provides better type safety. The error message format is consistent with the scalar value case.
  • Code Quality: Clean refactoring that maintains consistency between scalar and array value handling.

Scala Serialization (decimalExpressions.scala)

  • Line 42-47: Good use of pattern matching for type validation. The error message clearly identifies the problematic input type.
  • Note: The getSupportLevel implementation follows the established pattern in the codebase (verified against similar implementations in datetime.scala, arrays.scala, etc.).

Test Implementation (CometExpressionSuite.scala)

  • Line 3120-3124: Good use of SQL configurations to prevent optimization that might bypass the bug.
  • Line 3130: The test correctly verifies the fallback reason matches the expected error message.
  • Line 3149: The supported case verifies the operation runs on Comet (not falling back to Spark).

💭 Potential Concerns & Questions

  1. Spark 4.0 Constructor Difference (ShimCometTestBase.scala:52)

    • The Spark 4.0 version uses MakeDecimal(child, precision, scale, true) with a fourth parameter.
    • Question: What does the true parameter represent? Is it related to nullOnOverflow? If so, should this be configurable or tested with both values?
    • Suggestion: Consider adding a code comment explaining the fourth parameter purpose for future maintainability.
  2. Test Configuration Completeness

    • The test disables adaptive execution and constant folding to expose the bug.
    • Question: Are there other optimization paths that might still trigger this issue? Should we have additional test variants?
  3. Error Message Consistency

    • Rust error: "Expected Int64 but found {av:?}" (line 56)
    • Scala error: "Unsupported input data type: $other" (line 45)
    • These messages are slightly different in style. The Scala one is more user-friendly.
    • Suggestion: Consider whether the Rust error message could be caught earlier via the Scala validation (which it is), making this less critical.
  4. Missing Type Support

    • The fix explicitly supports only LongType.
    • Question: Are there other numeric types (e.g., IntegerType, ShortType, ByteType) that Spark MakeDecimal supports? If so, should Comet consider supporting them in the future, or is LongType the only valid internal representation?

🛡️ Security Considerations

  • ✅ No security concerns identified
  • The type checking prevents potential type confusion vulnerabilities
  • Proper error handling prevents potential panics in the Rust code

⚡ Performance Considerations

  • ✅ The getSupportLevel check happens at query planning time, not during execution, so there is no runtime performance impact
  • The Rust implementation remains efficient with early type validation
  • Fallback to Spark for unsupported types is the correct behavior

📋 Recommendations

  1. Documentation: Add a comment in the Spark 4.0 shim explaining why MakeDecimal requires the fourth parameter (at line 52).

  2. Consider Future Enhancement: Document whether other integer types could be supported by adding a cast to Long in the Scala layer before passing to Comet.

  3. Test Naming: The test names are clear, but consider adding a comment referencing the original issue for historical context (you already have this - great!).


✨ Overall Assessment

Rating: Excellent ⭐⭐⭐⭐⭐

This is a well-crafted fix that:

  • Addresses the root cause properly
  • Includes comprehensive tests
  • Maintains cross-version compatibility
  • Follows established patterns in the codebase
  • Provides clear fallback behavior

The only minor improvements would be additional documentation around the Spark 4.0 API difference. This PR is ready to merge with those optional enhancements.


Reviewed by: Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (3)

3115-3133: Verify the decimal precision and scale values.

The test creates a value of 123456 and attempts to convert it to decimal(3, 0), which can only hold values up to 999. With ANSI_ENABLED set to false, this would return null on overflow. However, since the test expects a fallback due to IntegerType, the overflow is never evaluated. Consider using precision/scale values that would accommodate the test value (e.g., decimal(10, 0)) to make the test intent clearer and avoid confusion.

Apply this diff to use appropriate precision/scale:

-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)

3135-3152: Verify the decimal precision and scale values.

Similar to the integer test, this test uses decimal(3, 0) for a value of 123456, which will overflow. While ANSI_ENABLED is set to false (allowing null on overflow), using a more appropriate precision like decimal(10, 0) would make the test clearer and better demonstrate the normal operator flow without relying on overflow behavior.

Apply this diff to use appropriate precision/scale:

-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)

3115-3152: Consider adding test coverage for NULL values.

The current tests don't verify NULL handling for MakeDecimal. Based on the learnings, NULL values for integers should be handled gracefully. Consider adding a test case that includes NULL input values to ensure the operator correctly returns NULL without panicking.

Add a test case:

test("make decimal using DataFrame API - with nulls") {
  withTable("t1") {
    sql("create table t1 using parquet as select case when id % 2 = 0 then null else cast(id as long) end as c1 from range(10)")
    
    withSQLConf(
      SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
      SQLConf.ANSI_ENABLED.key -> "false",
      SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") {
      
      val df = sql("select * from t1")
      val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)
      val df1 = df.withColumn("result", makeDecimalColumn)
      
      checkSparkAnswerAndOperator(df1)
    }
  }
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9349c09 and 726ca21.

📒 Files selected for processing (6)
  • native/spark-expr/src/math_funcs/internal/make_decimal.rs (1 hunks)
  • spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1 hunks)
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1 hunks)
  • spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (2 hunks)
  • spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (2 hunks)
  • spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
📚 Learning: 2025-11-04T14:26:48.750Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Applied to files:

  • native/spark-expr/src/math_funcs/internal/make_decimal.rs
🧬 Code graph analysis (5)
spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (2)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (50-52)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (51-53)
spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1)
spark/src/main/scala/org/apache/comet/serde/SupportLevel.scala (2)
  • Compatible (31-31)
  • Unsupported (42-42)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (51-53)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (2)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (50-52)
spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (50-52)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (3)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (50-52)
spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (50-52)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
  • createMakeDecimalColumn (51-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
🔇 Additional comments (5)
native/spark-expr/src/math_funcs/internal/make_decimal.rs (1)

43-57: LGTM! Array type validation correctly implemented.

The added type checking for Array inputs properly validates that only DataType::Int64 is supported, matching the existing scalar validation. The implementation correctly uses append_option which handles NULL values appropriately based on the learnings about NULL handling for integers.

spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1)

42-47: LGTM! Support level check aligns with Rust implementation.

The getSupportLevel method correctly validates that only LongType inputs are compatible, which corresponds to DataType::Int64 in the Rust implementation. The error message provides clear feedback for unsupported types.

spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)

50-52: LGTM! Helper method correctly implemented for Spark 3.4.

The createMakeDecimalColumn helper provides a consistent API across Spark versions for creating MakeDecimal columns in tests. The implementation correctly uses the 2-parameter MakeDecimal constructor appropriate for Spark 3.4.

spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)

50-52: LGTM! Helper method consistent with Spark 3.4 implementation.

The implementation correctly uses the 2-parameter MakeDecimal constructor for Spark 3.5, maintaining API consistency across test utilities.

spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)

51-53: LGTM! Helper method correctly adapted for Spark 4.0.

The implementation properly uses the 3-parameter MakeDecimal constructor with the nullOnOverflow flag (set to true) and wraps it with ExpressionColumnNode.apply, which is appropriate for Spark 4.0's API while maintaining a consistent interface across test utilities.

Comment on lines +3120 to +3124
withSQLConf(
SQLConf.USE_V1_SOURCE_LIST.key -> "parquet",
SQLConf.ANSI_ENABLED.key -> "false",
SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false",
SQLConf.ADAPTIVE_OPTIMIZER_EXCLUDED_RULES.key -> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the test code around lines 3120-3124
sed -n '3100,3150p' spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

Repository: martin-augment/datafusion-comet

Length of output: 2133


🏁 Script executed:

#!/bin/bash
# Search for the test function name to understand the context better
sed -n '3000,3150p' spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala | head -100

Repository: martin-augment/datafusion-comet

Length of output: 4224


🏁 Script executed:

#!/bin/bash
# Search for MakeDecimal in the test file to see related patterns
rg -n "MakeDecimal" spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala -B 5 -A 15

Repository: martin-augment/datafusion-comet

Length of output: 1466


🏁 Script executed:

#!/bin/bash
# Search for similar test patterns using withSQLConf and these specific configs
rg -n "ADAPTIVE_EXECUTION_ENABLED.*false" spark/src/test/scala/ -B 3 -A 3

Repository: martin-augment/datafusion-comet

Length of output: 8410


🏁 Script executed:

#!/bin/bash
# Check if there are comments or documentation about why these configs are needed
rg -n "ConstantFolding\|ADAPTIVE_EXECUTION_ENABLED" spark/src/test/scala/org/apache/comet/ -B 2 -A 2

Repository: martin-augment/datafusion-comet

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Search for other MakeDecimal tests and their configs
rg -n "MakeDecimal\|makeDecimal" spark/src/test/scala/ -B 10 -A 5 | grep -A 10 -B 10 "withSQLConf"

Repository: martin-augment/datafusion-comet

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Look for any comments or documentation about ConstantFolding
rg -n "ConstantFolding" spark/src/test/scala/org/apache/comet/ -B 3 -A 3

Repository: martin-augment/datafusion-comet

Length of output: 8481


🏁 Script executed:

#!/bin/bash
# Check the createMakeDecimalColumn function to understand what it does
rg -n "def createMakeDecimalColumn" spark/src/test/scala/ -A 10

Repository: martin-augment/datafusion-comet

Length of output: 57


🏁 Script executed:

#!/bin/bash
# Look for the issue 2813 or any comments linking to it
rg -n "2813" spark/src/test/scala/org/apache/comet/

Repository: martin-augment/datafusion-comet

Length of output: 204


Add comments explaining the SQL configuration settings for MakeDecimal tests.

These configurations are necessary for testing MakeDecimal with scalar values but lack documentation. Other similar tests in the file (e.g., lines 1437–1438) include explanatory comments:

  • ADAPTIVE_OPTIMIZER_EXCLUDED_RULES with ConstantFolding: Prevents Spark from pre-computing literal expressions, allowing actual MakeDecimal execution to be tested
  • ADAPTIVE_EXECUTION_ENABLED -> "false": Disables adaptive query execution to avoid plan restructuring that could interfere with testing

Add similar explanatory comments to lines 3120–3124 and 3139–3143 (the "long" variant test).

🤖 Prompt for AI Agents
In spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala around lines
3120–3124 and also update the analogous block at lines 3139–3143, add inline
comments explaining each SQLConf used for the MakeDecimal scalar tests: note
that ADAPTIVE_OPTIMIZER_EXCLUDED_RULES set to ConstantFolding prevents Spark
from pre-computing literal expressions so MakeDecimal is executed,
ADAPTIVE_EXECUTION_ENABLED -> "false" disables adaptive query execution to avoid
plan restructuring, ANSI_ENABLED -> "false" and USE_V1_SOURCE_LIST -> "parquet"
are set for deterministic behavior with parquet literals; place these brief
explanatory comments immediately above or beside the withSQLConf block in both
locations.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! For easier maintenance in the future it will be good to add a short comment next to each setting to explain why it is needed for this particular test. Prevents wasted time debugging such test in the future.

@martin-augment
Copy link
Copy Markdown
Owner Author

3115-3133: Verify the decimal precision and scale values.

The test creates a value of 123456 and attempts to convert it to decimal(3, 0), which can only hold values up to 999. With ANSI_ENABLED set to false, this would return null on overflow. However, since the test expects a fallback due to IntegerType, the overflow is never evaluated. Consider using precision/scale values that would accommodate the test value (e.g., decimal(10, 0)) to make the test intent clearer and avoid confusion.

Apply this diff to use appropriate precision/scale:

-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The purpose of the test is to verify that only Long/Int64 type could be used to create a decimal. Using a precision that leads to an overflow add a side effect to the test that is not needed here. Prevents confusion in the developer maintaining the codebase.

@martin-augment
Copy link
Copy Markdown
Owner Author

3135-3152: Verify the decimal precision and scale values.

Similar to the integer test, this test uses decimal(3, 0) for a value of 123456, which will overflow. While ANSI_ENABLED is set to false (allowing null on overflow), using a more appropriate precision like decimal(10, 0) would make the test clearer and better demonstrate the normal operator flow without relying on overflow behavior.

Apply this diff to use appropriate precision/scale:

-        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0)
+        val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The purpose of the test is to verify that only Long/Int64 type could be used to create a decimal. Using a precision that leads to an overflow add a side effect to the test that is not needed here. Prevents confusion in the developer maintaining the codebase.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 3, 2026

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants