Conversation
WalkthroughThis pull request extends the ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Review: Refactor Spark bitshift signatureSummaryThis PR refactors the Spark bitshift functions by consolidating three separate implementations ( ✅ Code Quality & Best PracticesExcellent refactoring work! The consolidation is well-executed:
Code organization is logical:
🐛 Potential Issues1. Missing UShr implementations for smaller integer typesThe Location: 2. Potential type mismatch in null handlingIn if value_array.data_type().is_null() || shift_array.data_type().is_null() {
return Ok(Arc::new(Int32Array::new_null(value_array.len())));
}Recommendation: Return a null array of the same type as 3. Unused import: ArrowNativeTypeThe import ⚡ Performance ConsiderationsPositive:
Note:
🔒 Security ConcernsNo security issues identified. The code properly handles:
The bit manipulation operations are safe and follow established patterns. 🧪 Test CoverageExcellent test coverage! The PR adds comprehensive test files:
Test scenarios covered:
Minor observation: The tests moved 📝 Suggestions
🎯 ConclusionThis is a high-quality refactoring that improves code maintainability without sacrificing functionality or performance. The consolidation of three separate UDF implementations into one is a significant improvement. Recommendation: Approve with minor fixes The only actionable item is addressing the null handling type consistency issue. The other suggestions are minor improvements that can be addressed at the maintainer's discretion. Great work on this refactoring! The approach of using a unified struct with an enum discriminator is a clean pattern that could be applied to similar function families in the codebase. |
| &arg_types[1], | ||
| )); | ||
| if value_array.data_type().is_null() || shift_array.data_type().is_null() { | ||
| return Ok(Arc::new(Int32Array::new_null(value_array.len()))); |
There was a problem hiding this comment.
Null handling here can return an Int32Array even when the first (value) argument is non-null and e.g. Int64/UInt64. Per return_type, the result type should follow the first arg; only when the first arg is Null should Int32 be produced. (Also applies if shift is Null but value is not.)
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! It should return the type of the value_array and fallback to Int32Array only when value_array.data_type().is_null(). The finding prevents problems in eventual usage of the returned value in more kernels which expect the correct type.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
datafusion/functions/src/macros.rs(2 hunks)datafusion/spark/src/function/bitwise/bit_shift.rs(4 hunks)datafusion/spark/src/function/bitwise/mod.rs(1 hunks)datafusion/sqllogictest/test_files/spark/bitwise/shiftright.slt(1 hunks)datafusion/sqllogictest/test_files/spark/bitwise/shiftrightunsigned.slt(1 hunks)datafusion/sqllogictest/test_files/spark/math/shiftleft.slt(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-06T13:37:46.028Z
Learnt from: martin-augment
Repo: martin-augment/datafusion PR: 5
File: datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:997-1005
Timestamp: 2025-11-06T13:37:46.028Z
Learning: In datafusion/core/tests/user_defined/user_defined_scalar_functions.rs, when implementing ScalarFunctionWrapper::replacement, always bounds-check placeholder_position against defaults.len() before indexing into the defaults array to prevent panics when SQL UDF bodies reference placeholders like $N where N exceeds the declared parameter count.
Applied to files:
datafusion/functions/src/macros.rsdatafusion/spark/src/function/bitwise/mod.rs
🧬 Code graph analysis (3)
datafusion/functions/src/macros.rs (1)
datafusion/expr/src/udf.rs (2)
new(855-865)new_from_impl(110-115)
datafusion/spark/src/function/bitwise/bit_shift.rs (3)
datafusion/common/src/utils/mod.rs (1)
take_function_args(926-940)datafusion/functions-nested/src/utils.rs (3)
make_scalar_function(53-83)args(42-42)args(88-91)datafusion/expr-common/src/signature.rs (1)
new_implicit(922-934)
datafusion/spark/src/function/bitwise/mod.rs (1)
datafusion/spark/src/function/bitwise/bit_shift.rs (3)
left(257-259)right(261-263)right_unsigned(265-267)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
| if value_array.data_type().is_null() || shift_array.data_type().is_null() { | ||
| return Ok(Arc::new(Int32Array::new_null(value_array.len()))); | ||
| } |
There was a problem hiding this comment.
Fix null-shift output type to match planner expectations
When the shift column is the NULL type (e.g., SELECT shiftleft(1::bigint, CAST(NULL AS NULL))), this branch returns an Int32Array. However, return_type reports whatever type the value column resolved to (e.g., Int64), so downstream consumers see an Int64 UDF but get an Int32 array. That mismatched dtype will trigger panics in later kernels once the planner trusts the advertised schema.
Return a null array whose datatype mirrors the resolved output instead of hard-coding Int32. Reusing the existing logic for value_array.is_null() keeps the legacy behavior for all-null inputs.
- if value_array.data_type().is_null() || shift_array.data_type().is_null() {
- return Ok(Arc::new(Int32Array::new_null(value_array.len())));
- }
+ if shift_array.data_type().is_null() {
+ let resolved_type = if value_array.data_type().is_null() {
+ DataType::Int32
+ } else {
+ value_array.data_type().clone()
+ };
+ return Ok(arrow::array::new_null_array(&resolved_type, value_array.len()));
+ }
+
+ if value_array.data_type().is_null() {
+ return Ok(arrow::array::new_null_array(&DataType::Int32, value_array.len()));
+ }Remember to add new_null_array to the arrow::array imports.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if value_array.data_type().is_null() || shift_array.data_type().is_null() { | |
| return Ok(Arc::new(Int32Array::new_null(value_array.len()))); | |
| } | |
| if shift_array.data_type().is_null() { | |
| let resolved_type = if value_array.data_type().is_null() { | |
| DataType::Int32 | |
| } else { | |
| value_array.data_type().clone() | |
| }; | |
| return Ok(arrow::array::new_null_array(&resolved_type, value_array.len())); | |
| } | |
| if value_array.data_type().is_null() { | |
| return Ok(arrow::array::new_null_array(&DataType::Int32, value_array.len())); | |
| } |
🤖 Prompt for AI Agents
In datafusion/spark/src/function/bitwise/bit_shift.rs around lines 146 to 148,
the branch that handles a NULL-typed shift column returns an Int32Array which
can mismatch the planner-reported return_type (e.g., Int64) and cause downstream
panics; change this to return a null array whose data type mirrors the resolved
return_type (reuse the existing logic used for value_array.is_null()), and
ensure you add new_null_array to the arrow::array imports so you can construct a
null array with the correct DataType and length.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! It should return the type of the value_array and fallback to Int32Array only when value_array.data_type().is_null(). The finding prevents problems in eventual usage of the returned value in more kernels which expect the correct type.
value:useful; category:bug; feedback:The Claude AI reviewer is correct! It should return the type of the value_array and fallback to Int32Array only when value_array.data_type().is_null(). The finding prevents problems in eventual usage of the returned value in more kernels which expect the correct type. |
value:good-to-have; category:bug; feedback:The Claude AI reviewer is correct that the import is no more needed and can be removed! The finding prevents a warning by the linter (clippy). |
18649: To review by AI