2815: fix: Fall back to Spark for MakeDecimal with unsupported input type#38
2815: fix: Fall back to Spark for MakeDecimal with unsupported input type#38martin-augment wants to merge 12 commits intomainfrom
Conversation
….scala Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
WalkthroughThis pull request adds input type validation to the MakeDecimal expression across Rust and Scala implementations. The Rust implementation now validates that Array inputs have DataType::Int64, rejecting other types. The Scala implementation adds a getSupportLevel method to CometMakeDecimal that returns Compatible for LongType inputs and Unsupported for other types. Test utilities are updated across Spark versions 3.4, 3.5, and 4.0 with a createMakeDecimalColumn helper method. New test cases verify behavior with both integer and long inputs via the DataFrame API. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code ReviewSummaryThis PR addresses issue apache#2813 by adding proper type checking for the MakeDecimal expression to ensure it only accepts LongType inputs, falling back to Spark for unsupported types like IntegerType. The fix is well-implemented with appropriate test coverage. ✅ Strengths
🔍 ObservationsRust Implementation (make_decimal.rs)
Scala Serialization (decimalExpressions.scala)
Test Implementation (CometExpressionSuite.scala)
💭 Potential Concerns & Questions
🛡️ Security Considerations
⚡ Performance Considerations
📋 Recommendations
✨ Overall AssessmentRating: Excellent ⭐⭐⭐⭐⭐ This is a well-crafted fix that:
The only minor improvements would be additional documentation around the Spark 4.0 API difference. This PR is ready to merge with those optional enhancements. Reviewed by: Claude Code |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (3)
3115-3133: Verify the decimal precision and scale values.The test creates a value of
123456and attempts to convert it todecimal(3, 0), which can only hold values up to999. WithANSI_ENABLEDset tofalse, this would returnnullon overflow. However, since the test expects a fallback due toIntegerType, the overflow is never evaluated. Consider using precision/scale values that would accommodate the test value (e.g.,decimal(10, 0)) to make the test intent clearer and avoid confusion.Apply this diff to use appropriate precision/scale:
- val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0) + val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)
3135-3152: Verify the decimal precision and scale values.Similar to the integer test, this test uses
decimal(3, 0)for a value of123456, which will overflow. WhileANSI_ENABLEDis set tofalse(allowing null on overflow), using a more appropriate precision likedecimal(10, 0)would make the test clearer and better demonstrate the normal operator flow without relying on overflow behavior.Apply this diff to use appropriate precision/scale:
- val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 3, 0) + val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0)
3115-3152: Consider adding test coverage for NULL values.The current tests don't verify NULL handling for MakeDecimal. Based on the learnings, NULL values for integers should be handled gracefully. Consider adding a test case that includes NULL input values to ensure the operator correctly returns NULL without panicking.
Add a test case:
test("make decimal using DataFrame API - with nulls") { withTable("t1") { sql("create table t1 using parquet as select case when id % 2 = 0 then null else cast(id as long) end as c1 from range(10)") withSQLConf( SQLConf.USE_V1_SOURCE_LIST.key -> "parquet", SQLConf.ANSI_ENABLED.key -> "false", SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false") { val df = sql("select * from t1") val makeDecimalColumn = createMakeDecimalColumn(df.col("c1").expr, 10, 0) val df1 = df.withColumn("result", makeDecimalColumn) checkSparkAnswerAndOperator(df1) } } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
native/spark-expr/src/math_funcs/internal/make_decimal.rs(1 hunks)spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala(1 hunks)spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala(1 hunks)spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala(2 hunks)spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala(2 hunks)spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
📚 Learning: 2025-11-04T14:26:48.750Z
Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().
Applied to files:
native/spark-expr/src/math_funcs/internal/make_decimal.rs
🧬 Code graph analysis (5)
spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (2)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(50-52)spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(51-53)
spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1)
spark/src/main/scala/org/apache/comet/serde/SupportLevel.scala (2)
Compatible(31-31)Unsupported(42-42)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(51-53)
spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (2)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(50-52)spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(50-52)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (3)
spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(50-52)spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(50-52)spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
createMakeDecimalColumn(51-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Cursor Bugbot
- GitHub Check: claude-review
🔇 Additional comments (5)
native/spark-expr/src/math_funcs/internal/make_decimal.rs (1)
43-57: LGTM! Array type validation correctly implemented.The added type checking for Array inputs properly validates that only
DataType::Int64is supported, matching the existing scalar validation. The implementation correctly usesappend_optionwhich handles NULL values appropriately based on the learnings about NULL handling for integers.spark/src/main/scala/org/apache/comet/serde/decimalExpressions.scala (1)
42-47: LGTM! Support level check aligns with Rust implementation.The
getSupportLevelmethod correctly validates that onlyLongTypeinputs are compatible, which corresponds toDataType::Int64in the Rust implementation. The error message provides clear feedback for unsupported types.spark/src/test/spark-3.4/org/apache/spark/sql/ShimCometTestBase.scala (1)
50-52: LGTM! Helper method correctly implemented for Spark 3.4.The
createMakeDecimalColumnhelper provides a consistent API across Spark versions for creating MakeDecimal columns in tests. The implementation correctly uses the 2-parameterMakeDecimalconstructor appropriate for Spark 3.4.spark/src/test/spark-3.5/org/apache/spark/sql/ShimCometTestBase.scala (1)
50-52: LGTM! Helper method consistent with Spark 3.4 implementation.The implementation correctly uses the 2-parameter
MakeDecimalconstructor for Spark 3.5, maintaining API consistency across test utilities.spark/src/test/spark-4.0/org/apache/spark/sql/ShimCometTestBase.scala (1)
51-53: LGTM! Helper method correctly adapted for Spark 4.0.The implementation properly uses the 3-parameter
MakeDecimalconstructor with thenullOnOverflowflag (set totrue) and wraps it withExpressionColumnNode.apply, which is appropriate for Spark 4.0's API while maintaining a consistent interface across test utilities.
| withSQLConf( | ||
| SQLConf.USE_V1_SOURCE_LIST.key -> "parquet", | ||
| SQLConf.ANSI_ENABLED.key -> "false", | ||
| SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "false", | ||
| SQLConf.ADAPTIVE_OPTIMIZER_EXCLUDED_RULES.key -> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding") { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read the test code around lines 3120-3124
sed -n '3100,3150p' spark/src/test/scala/org/apache/comet/CometExpressionSuite.scalaRepository: martin-augment/datafusion-comet
Length of output: 2133
🏁 Script executed:
#!/bin/bash
# Search for the test function name to understand the context better
sed -n '3000,3150p' spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala | head -100Repository: martin-augment/datafusion-comet
Length of output: 4224
🏁 Script executed:
#!/bin/bash
# Search for MakeDecimal in the test file to see related patterns
rg -n "MakeDecimal" spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala -B 5 -A 15Repository: martin-augment/datafusion-comet
Length of output: 1466
🏁 Script executed:
#!/bin/bash
# Search for similar test patterns using withSQLConf and these specific configs
rg -n "ADAPTIVE_EXECUTION_ENABLED.*false" spark/src/test/scala/ -B 3 -A 3Repository: martin-augment/datafusion-comet
Length of output: 8410
🏁 Script executed:
#!/bin/bash
# Check if there are comments or documentation about why these configs are needed
rg -n "ConstantFolding\|ADAPTIVE_EXECUTION_ENABLED" spark/src/test/scala/org/apache/comet/ -B 2 -A 2Repository: martin-augment/datafusion-comet
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Search for other MakeDecimal tests and their configs
rg -n "MakeDecimal\|makeDecimal" spark/src/test/scala/ -B 10 -A 5 | grep -A 10 -B 10 "withSQLConf"Repository: martin-augment/datafusion-comet
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Look for any comments or documentation about ConstantFolding
rg -n "ConstantFolding" spark/src/test/scala/org/apache/comet/ -B 3 -A 3Repository: martin-augment/datafusion-comet
Length of output: 8481
🏁 Script executed:
#!/bin/bash
# Check the createMakeDecimalColumn function to understand what it does
rg -n "def createMakeDecimalColumn" spark/src/test/scala/ -A 10Repository: martin-augment/datafusion-comet
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Look for the issue 2813 or any comments linking to it
rg -n "2813" spark/src/test/scala/org/apache/comet/Repository: martin-augment/datafusion-comet
Length of output: 204
Add comments explaining the SQL configuration settings for MakeDecimal tests.
These configurations are necessary for testing MakeDecimal with scalar values but lack documentation. Other similar tests in the file (e.g., lines 1437–1438) include explanatory comments:
ADAPTIVE_OPTIMIZER_EXCLUDED_RULESwithConstantFolding: Prevents Spark from pre-computing literal expressions, allowing actual MakeDecimal execution to be testedADAPTIVE_EXECUTION_ENABLED -> "false": Disables adaptive query execution to avoid plan restructuring that could interfere with testing
Add similar explanatory comments to lines 3120–3124 and 3139–3143 (the "long" variant test).
🤖 Prompt for AI Agents
In spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala around lines
3120–3124 and also update the analogous block at lines 3139–3143, add inline
comments explaining each SQLConf used for the MakeDecimal scalar tests: note
that ADAPTIVE_OPTIMIZER_EXCLUDED_RULES set to ConstantFolding prevents Spark
from pre-computing literal expressions so MakeDecimal is executed,
ADAPTIVE_EXECUTION_ENABLED -> "false" disables adaptive query execution to avoid
plan restructuring, ANSI_ENABLED -> "false" and USE_V1_SOURCE_LIST -> "parquet"
are set for deterministic behavior with parquet literals; place these brief
explanatory comments immediately above or beside the withSQLConf block in both
locations.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! For easier maintenance in the future it will be good to add a short comment next to each setting to explain why it is needed for this particular test. Prevents wasted time debugging such test in the future.
value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The purpose of the test is to verify that only Long/Int64 type could be used to create a decimal. Using a precision that leads to an overflow add a side effect to the test that is not needed here. Prevents confusion in the developer maintaining the codebase. |
value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! The purpose of the test is to verify that only Long/Int64 type could be used to create a decimal. Using a precision that leads to an overflow add a side effect to the test that is not needed here. Prevents confusion in the developer maintaining the codebase. |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
2815: To review by AI