2835: feat: Support string non int numeric types by martin-augment · Pull Request #32 · martin-augment/datafusion-comet

martin-augment · 2025-12-02T15:05:05Z

2835: To review by AI

coderabbitai · 2025-12-02T15:05:21Z

Walkthrough

This pull request extends casting support from strings to numeric types in the Comet query engine. The documentation was updated to reflect string-to-float, string-to-double, and string-to-decimal casts moving from incompatible to compatible. Native Rust code implements parsing logic for converting strings to decimal and floating-point types, including special value handling. The Scala compatibility layer marks these casts as compatible rather than incompatible. Test coverage was expanded with special value test cases for float, double, and decimal casting operations across ANSI and non-ANSI modes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-2835-2025-12-02-15-05-01

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2025-12-02T15:08:26Z

native/spark-expr/src/conversion_funcs/cast.rs

            cast_string_to_timestamp(&array, to_type, eval_mode, &cast_options.timezone)
        }
        (Utf8, Date32) => cast_string_to_date(&array, to_type, eval_mode),
+        (Utf8, Float16 | Float32 | Float64) => cast_string_to_float(&array, to_type, eval_mode),


This match arm includes Float16, but cast_string_to_float only handles Float32/Float64; attempting STRING→FLOAT16 will hit the catch-all and return an internal error.

_{🤖 Was this useful? React with 👍 or 👎}

value:useful; category:bug; feedback:The Augment AI reviewer is correct! cast_string_to_float() (defined at line 1301) currently supports only Fload32 and Float64, but not Float16. Trying to cast a Float16 will return an internal error.

augmentcode · 2025-12-02T15:08:26Z

native/spark-expr/src/conversion_funcs/cast.rs

        ),
-        DataType::Utf8 if allow_incompat => matches!(
+        DataType::Utf8 if allow_incompat => {
+            matches!(to_type, DataType::Binary | DataType::Decimal128(_, _))


In is_datafusion_spark_compatible, the Utf8 if allow_incompat branch excludes Float32/Float64 and Decimal256, so enabling allow_incompat actually removes float compatibility and omits decimal256—likely unintended given the new support.

_{🤖 Was this useful? React with 👍 or 👎}

value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! Both the old and the new version of the code do not support Float32/64 and Decimal256 in any of the match arms, with or without allow_incompat. Prevents error results at runtime when any of these types are used.

augmentcode · 2025-12-02T15:08:26Z

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

-        testAnsi = false)
+  test("cast StringType to DoubleType") {
+    Seq(true, false).foreach { v =>
+      castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v)


In the "cast StringType to DoubleType" test, the cast uses DataTypes.FloatType instead of DataTypes.DoubleType, so it doesn't exercise double casting.

_{🤖 Was this useful? React with 👍 or 👎}

value:useful; category:bug; feedback:The Augment AI reviewer is correct! It looks like this is a copy/paste error. The test should use DoubleType as the test name suggests. Prevents regressions related to the support for DoubleType because it is not tested at the moment.

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca112c4 and c565a6d.

📒 Files selected for processing (4)

docs/source/user-guide/latest/compatibility.md (1 hunks)
native/spark-expr/src/conversion_funcs/cast.rs (7 hunks)
spark/src/main/scala/org/apache/comet/expressions/CometCast.scala (1 hunks)
spark/src/test/scala/org/apache/comet/CometCastSuite.scala (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-11-04T14:26:48.750Z

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 7
File: native/spark-expr/src/math_funcs/abs.rs:201-302
Timestamp: 2025-11-04T14:26:48.750Z
Learning: In the abs function in native/spark-expr/src/math_funcs/abs.rs (Rust), NULL values for signed integers (Int8, Int16, Int32, Int64) and decimals (Decimal128, Decimal256) should return the argument as-is (e.g., ColumnarValue::Scalar(ScalarValue::Int8(None))) rather than panicking on unwrap().

Applied to files:

native/spark-expr/src/conversion_funcs/cast.rs

📚 Learning: 2025-11-11T20:44:05.014Z

Learnt from: martin-augment
Repo: martin-augment/datafusion-comet PR: 19
File: native/spark-expr/src/array_funcs/array_insert.rs:221-238
Timestamp: 2025-11-11T20:44:05.014Z
Learning: In Rust code using Apache Arrow arrays, always check `is_null(index)` before calling `value(index)` on `PrimitiveArray` types (such as `Int32Array`, `Int64Array`, etc.), because `value()` does not check for nulls and returns arbitrary values for null slots. This applies to functions in `native/spark-expr/src` that process Arrow arrays.

Applied to files:

native/spark-expr/src/conversion_funcs/cast.rs

🧬 Code graph analysis (1)

spark/src/main/scala/org/apache/comet/expressions/CometCast.scala (1)

spark/src/main/scala/org/apache/comet/serde/SupportLevel.scala (1)

Compatible (31-31)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Cursor Bugbot
GitHub Check: claude-review

🔇 Additional comments (8)

spark/src/main/scala/org/apache/comet/expressions/CometCast.scala (1)

187-190: LGTM - String to float/double and decimal casts are now marked as compatible.

The change from Incompatible to Compatible() aligns with the native Rust implementation that now provides Spark-compatible string parsing for these numeric types.

docs/source/user-guide/latest/compatibility.md (1)

162-164: Documentation update correctly reflects the new compatible casts.

The addition of string → float, string → double, and string → decimal to the Compatible Casts table is consistent with the implementation changes.

native/spark-expr/src/conversion_funcs/cast.rs (4)

214-216: LGTM - Extended can_cast_from_string to include new types.

The function now correctly reports Float32, Float64, Decimal128, and Decimal256 as valid target types for string casts.

1358-1408: LGTM - FloatParse trait implementations handle special values correctly.

The implementations properly handle infinity variants, NaN, and suffix stripping for Spark compatibility.

1537-1543: Clarify the conditional logic for Utf8 compatibility with Decimal128.

The can_cast_from_string function (line 215) returns true for Decimal128 unconditionally, while is_datafusion_spark_compatible (lines 1537-1543) requires allow_incompat to be true for Utf8→Decimal128 casts. This difference is inconsistent with how other compatibility-dependent casts (like Date32/Date64) are handled, where can_cast_from_string itself checks options.allow_incompat (lines 217-221). Although Utf8→Decimal128 casts have explicit custom handling in spark_cast (lines 965-970) that bypasses the compatibility check, the divergent conditional logic between the two functions should be reconciled or documented to prevent confusion during maintenance.

1194-1197: Inconsistent suffix handling between decimal and float parsing is correct and intentional.

The code correctly implements Spark's behavior: float parsing (FloatParse) explicitly strips 'd'/'f' suffixes before parsing (matching Java's Double.parseDouble()), while decimal parsing does NOT strip suffixes and lets them fail naturally (matching Spark's JavaBigDecimal(string) which rejects suffixes per Java's BigDecimal spec). This inconsistency mirrors the underlying Java APIs and is the correct behavior. No changes needed.

spark/src/test/scala/org/apache/comet/CometCastSuite.scala (2)

655-679: Good test coverage for special string values.

The specialValues sequence provides comprehensive coverage for edge cases including suffixed values, infinity/NaN variants, scientific notation, whitespace, and invalid inputs.

700-733: LGTM - Decimal casting tests cover basic and high precision scenarios.

The tests appropriately cover:

Basic decimal values with positive/negative numbers

Precision edge cases like "0.001" and high precision values

Invalid inputs (inf, empty string, non-numeric)

The testAnsi = false parameter is appropriate since ANSI mode has different error handling

coderabbitai · 2025-12-02T15:09:22Z

native/spark-expr/src/conversion_funcs/cast.rs

+    let string_array = array
+        .as_any()
+        .downcast_ref::<StringArray>()
+        .ok_or_else(|| SparkError::Internal("Expected string array".to_string()))?;


⚠️ Potential issue | 🟠 Major

LargeUtf8 input will cause a runtime error for decimal casts.

The match pattern at lines 965-970 accepts both Utf8 and LargeUtf8, but downcast_ref::<StringArray>() only works for Utf8. When the input is LargeUtf8, this will return None and cause an error.

- let string_array = array - .as_any() - .downcast_ref::<StringArray>() - .ok_or_else(|| SparkError::Internal("Expected string array".to_string()))?; + // Handle both Utf8 (StringArray) and LargeUtf8 (LargeStringArray) + fn get_string_value<O: OffsetSizeTrait>(array: &ArrayRef, i: usize) -> Option<&str> { + array.as_any() + .downcast_ref::<GenericStringArray<O>>() + .map(|arr| if arr.is_null(i) { None } else { Some(arr.value(i)) }) + .flatten() + }

Consider using GenericStringArray<O> with a generic offset type, similar to the pattern used in cast_string_to_int.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

native/spark-expr/src/conversion_funcs/cast.rs around lines 1080-1083: the code unconditionally downcasts to StringArray which fails for LargeUtf8 inputs; update the logic to handle both Utf8 and LargeUtf8 by using GenericStringArray with a generic offset type (or by matching the DataType and downcasting to the appropriate StringArray/LargeStringArray variant as done in cast_string_to_int). Replace the single downcast_ref::<StringArray>() with a branch that constructs a GenericStringArray<O> (or performs the correct downcast for each offset type), then use that generic array for the decimal cast logic so LargeUtf8 inputs are handled without runtime errors.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! LargeUtf8 cannot be casted to StringArray, so the new code will need to use its parent trait - GenericStringArray. Prevents internal error at runtime when LargeUtf8 is passed.

coderabbitai · 2025-12-02T15:09:22Z

native/spark-expr/src/conversion_funcs/cast.rs

+            match parse_string_to_decimal(str_value, precision, scale) {
+                Ok(Some(decimal_value)) => {
+                    // Convert i128 to i256
+                    let i256_value = i256::from_i128(decimal_value);
+                    decimal_builder.append_value(i256_value);


⚠️ Potential issue | 🟠 Major

Decimal256 precision is limited to i128 range.

The implementation parses the string to i128 first, then converts to i256. This limits Decimal256 values to the i128 range (~38 decimal digits), which may not cover the full Decimal256 precision (up to 76 decimal digits).

For full Decimal256 support, consider parsing directly to i256 or using a big integer library. If this limitation is intentional, add a comment documenting it.

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! This is caused by a limitation of parse_string_to_decimal() function. Currently it supports parsing up to the biggest integer type in the standard library - i128. i256 is a custom type provided by the Arrow-rs library and support for it would be nice to have.

coderabbitai · 2025-12-02T15:09:22Z

native/spark-expr/src/conversion_funcs/cast.rs

+/// Parse a decimal string into (mantissa, scale)
+/// e.g., "123.45" -> (12345, 2), "-0.001" -> (-1, 3)
+fn parse_decimal_str(s: &str) -> Result<(i128, i32), String> {
+    let s = s.trim();
+    if s.is_empty() {
+        return Err("Empty string".to_string());
+    }
+
+    let negative = s.starts_with('-');
+    let s = if negative || s.starts_with('+') {
+        &s[1..]
+    } else {
+        s
+    };
+
+    // Split by decimal point
+    let parts: Vec<&str> = s.split('.').collect();
+
+    if parts.len() > 2 {
+        return Err("Multiple decimal points".to_string());
+    }
+
+    let integral_part = parts[0];
+    let fractional_part = if parts.len() == 2 { parts[1] } else { "" };
+
+    // Parse integral part
+    let integral_value: i128 = if integral_part.is_empty() {
+        0
+    } else {
+        integral_part
+            .parse()
+            .map_err(|_| "Invalid integral part".to_string())?
+    };
+
+    // Parse fractional part
+    let scale = fractional_part.len() as i32;
+    let fractional_value: i128 = if fractional_part.is_empty() {
+        0
+    } else {
+        fractional_part
+            .parse()
+            .map_err(|_| "Invalid fractional part".to_string())?
+    };
+
+    // Combine: value = integral * 10^scale + fractional
+    let mantissa = integral_value
+        .checked_mul(10_i128.pow(scale as u32))
+        .and_then(|v| v.checked_add(fractional_value))
+        .ok_or("Overflow in mantissa calculation")?;
+
+    let final_mantissa = if negative { -mantissa } else { mantissa };
+
+    Ok((final_mantissa, scale))
+}


⚠️ Potential issue | 🟡 Minor

Missing scientific notation support in decimal string parsing.

The parse_decimal_str function does not handle scientific notation (e.g., "1.23e4", "1E-5"). The float parsing (FloatParse) handles such values, but they will fail when casting strings like "1.23e4" to decimal.

Spark's BigDecimal constructor supports scientific notation. Consider adding support or documenting this as a known limitation:

+ // Check for scientific notation - not yet supported + if s.contains('e') || s.contains('E') { + return Err("Scientific notation not supported".to_string()); + } + // Split by decimal point let parts: Vec<&str> = s.split('.').collect();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Parse a decimal string into (mantissa, scale)

/// e.g., "123.45" -> (12345, 2), "-0.001" -> (-1, 3)

fn parse_decimal_str(s: &str) -> Result<(i128, i32), String> {

let s = s.trim();

if s.is_empty() {

return Err("Empty string".to_string());

}

let negative = s.starts_with('-');

let s = if negative || s.starts_with('+') {

&s[1..]

} else {

s

};

// Split by decimal point

let parts: Vec<&str> = s.split('.').collect();

if parts.len() > 2 {

return Err("Multiple decimal points".to_string());

}

let integral_part = parts[0];

let fractional_part = if parts.len() == 2 { parts[1] } else { "" };

// Parse integral part

let integral_value: i128 = if integral_part.is_empty() {

0

} else {

integral_part

.parse()

.map_err(|_| "Invalid integral part".to_string())?

};

// Parse fractional part

let scale = fractional_part.len() as i32;

let fractional_value: i128 = if fractional_part.is_empty() {

0

} else {

fractional_part

.parse()

.map_err(|_| "Invalid fractional part".to_string())?

};

// Combine: value = integral * 10^scale + fractional

let mantissa = integral_value

.checked_mul(10_i128.pow(scale as u32))

.and_then(|v| v.checked_add(fractional_value))

.ok_or("Overflow in mantissa calculation")?;

let final_mantissa = if negative { -mantissa } else { mantissa };

Ok((final_mantissa, scale))

}

/// Parse a decimal string into (mantissa, scale)

/// e.g., "123.45" -> (12345, 2), "-0.001" -> (-1, 3)

fn parse_decimal_str(s: &str) -> Result<(i128, i32), String> {

let s = s.trim();

if s.is_empty() {

return Err("Empty string".to_string());

}

let negative = s.starts_with('-');

let s = if negative || s.starts_with('+') {

&s[1..]

} else {

s

};

// Check for scientific notation - not yet supported

if s.contains('e') || s.contains('E') {

return Err("Scientific notation not supported".to_string());

}

// Split by decimal point

let parts: Vec<&str> = s.split('.').collect();

if parts.len() > 2 {

return Err("Multiple decimal points".to_string());

}

let integral_part = parts[0];

let fractional_part = if parts.len() == 2 { parts[1] } else { "" };

// Parse integral part

let integral_value: i128 = if integral_part.is_empty() {

0

} else {

integral_part

.parse()

.map_err(|_| "Invalid integral part".to_string())?

};

// Parse fractional part

let scale = fractional_part.len() as i32;

let fractional_value: i128 = if fractional_part.is_empty() {

0

} else {

fractional_part

.parse()

.map_err(|_| "Invalid fractional part".to_string())?

};

// Combine: value = integral * 10^scale + fractional

let mantissa = integral_value

.checked_mul(10_i128.pow(scale as u32))

.and_then(|v| v.checked_add(fractional_value))

.ok_or("Overflow in mantissa calculation")?;

let final_mantissa = if negative { -mantissa } else { mantissa };

Ok((final_mantissa, scale))

}

🤖 Prompt for AI Agents

In native/spark-expr/src/conversion_funcs/cast.rs around lines 1246 to 1299, parse_decimal_str currently only handles fixed-point strings and rejects scientific notation (e.g. "1.23e4", "1E-5"); update the parser to detect an optional exponent part (split on 'e' or 'E'), validate and parse the exponent as a signed integer, then parse the mantissa portion with the existing logic to produce (mantissa, scale) and adjust the scale by subtracting the exponent (scale = scale - exponent) so the numeric value remains mantissa * 10^-scale; ensure you apply checked arithmetic when shifting the mantissa (multiplying or dividing by powers of ten) and surface clear errors for invalid exponent syntax or overflow; keep handling of leading sign as before and preserve tests for corner cases (multiple dots, empty parts).

value:good-to-have; category:bug; feedback:The CodeRabbit AI reviewer is correct! parse_decimal_str() does not support scientific notation. This should either be improved or documented as not supported.

coderabbitai · 2025-12-02T15:09:23Z

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

+  test("cast StringType to DoubleType") {
+    Seq(true, false).foreach { v =>
+      castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v)
    }
  }


⚠️ Potential issue | 🔴 Critical

Bug: Test casts to FloatType instead of DoubleType.

The test is named "cast StringType to DoubleType" but line 690 casts to FloatType instead of DoubleType. This appears to be a copy-paste error.

test("cast StringType to DoubleType") { Seq(true, false).foreach { v => - castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v) + castTest(specialValues.toDF("a"), DataTypes.DoubleType, testAnsi = v) } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

test("cast StringType to DoubleType") {

Seq(true, false).foreach { v =>

castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v)

}

}

test("cast StringType to DoubleType") {

Seq(true, false).foreach { v =>

castTest(specialValues.toDF("a"), DataTypes.DoubleType, testAnsi = v)

}

}

🤖 Prompt for AI Agents

In spark/src/test/scala/org/apache/comet/CometCastSuite.scala around lines 688 to 692, the test named "cast StringType to DoubleType" mistakenly calls castTest with DataTypes.FloatType; change the type argument to DataTypes.DoubleType so the test actually casts to DoubleType, leaving the rest of the test logic unchanged.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! It looks like this is a copy/paste error. The test should use DoubleType as the test name suggests. Prevents regressions related to the support for DoubleType because it is not tested at the moment.

claude · 2025-12-02T15:10:11Z

PR Review: Support string to non-int numeric types casting

Summary

This PR implements string-to-float, string-to-double, and string-to-decimal casting operations that were previously marked as incompatible. The implementation properly handles special values (inf, NaN), suffixes ('d', 'f'), and ANSI mode. Overall, this is a solid improvement that removes significant compatibility gaps.

Code Quality & Best Practices ✅

Strengths:

Well-structured code with clear separation of concerns (decimal vs float handling)
Good use of Rust traits (FloatParse) for type-generic implementations
Proper error handling with distinction between ANSI and non-ANSI modes
Comprehensive documentation via inline comments

Minor Issues:

Bug in test at line 690 🐛

test("cast StringType to DoubleType") {
  Seq(true, false).foreach { v =>
    castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v)  // Should be DoubleType!
  }
}

This test is casting to FloatType instead of DoubleType, so it's not actually testing double conversion.

Missing function: is_validate_decimal_precision at cast.rs:1229

The code calls is_validate_decimal_precision(value, precision) but this function doesn't appear to be defined. Looking at similar code in the codebase (e.g., make_decimal.rs:63), you should use Arrow's built-in:
```
use arrow::datatypes::validate_decimal_precision;

// Then at line 1229:
if validate_decimal_precision(value, precision).is_ok() {
```
This will cause a compilation error. Please verify the build passes.
Decimal parsing doesn't handle 'D'/'F' suffixes

In parse_string_to_decimal (line 1195-1196), the comment says:

"We do NOT strip 'D' or 'F' suffixes - let parsing fail naturally"

However, the test at line 707 includes "123.45D" which suggests these suffixes should be supported. Spark's Java BigDecimal constructor doesn't accept these suffixes, so the string parsing will fail. This is inconsistent with the float implementation (lines 1375-1376, 1401-1402) which explicitly strips these suffixes.

Recommendation: Strip 'D' and 'F' suffixes in parse_decimal_str similar to the float implementation for consistency with Spark behavior.

Potential Bugs 🐛

Compilation error: Missing is_validate_decimal_precision function (see above)
Test bug: DoubleType test casts to FloatType
Inconsistent suffix handling: Decimal doesn't strip D/F but float does

Performance Considerations ⚡

Good:

Uses builders efficiently with pre-allocated capacity
Avoids unnecessary string allocations
Checked arithmetic prevents panics

Minor optimization opportunity:

Lines 1182, 1360, 1386: to_lowercase() allocates a new string. For better performance with hot paths, consider using eq_ignore_ascii_case() for simple comparisons:
```
if s.eq_ignore_ascii_case("inf") || s.eq_ignore_ascii_case("+inf") ...
```
This is a micro-optimization and not critical.

Security Concerns 🔒

No major security issues identified.

The implementation properly handles:

Integer overflow (checked arithmetic)
Buffer overflows (Arrow's safe abstractions)
Invalid inputs (returns errors or nulls appropriately)

Test Coverage ✅

Strong test coverage:

Comprehensive special values (inf, NaN, signed variants)
Suffix handling ('d', 'f')
Scientific notation
Edge cases (empty strings, whitespace, invalid inputs)
Both ANSI and non-ANSI modes
High precision decimals (38,10)

Missing test cases:

Very large numbers that exceed precision bounds
Multiple decimal points (e.g., "1.2.3")
Leading zeros (e.g., "000123.45")
Decimal numbers with scientific notation (if supported)
The commented-out ignore test at line 694 should potentially be re-enabled or removed

Documentation ✅

Documentation changes look good:

Properly moves cast operations from "incompatible" to "compatible" table
Removes outdated compatibility notes

Action Items

Must Fix (Blocking):

Fix missing is_validate_decimal_precision function - use validate_decimal_precision from arrow::datatypes
Fix test bug: Line 690 should cast to DataTypes.DoubleType not FloatType

Should Fix (Non-blocking but important):
3. Consider stripping 'D'/'F' suffixes in decimal parsing for consistency with Spark and float behavior
4. Verify the implementation actually works with test value "123.45D" at line 707

Nice to Have:
5. Add more edge case tests mentioned above
6. Consider micro-optimization with eq_ignore_ascii_case() if profiling shows string comparison as a hotspot

Recommendation

REQUEST CHANGES - The missing function will cause compilation failure. Please fix the blocking issues and I'd be happy to approve! The overall approach is solid and this is a valuable feature addition.

Great work on tackling these compatibility gaps! 🚀

cursor · 2025-12-02T15:11:25Z

spark/src/test/scala/org/apache/comet/CometCastSuite.scala

-        testAnsi = false)
+  test("cast StringType to DoubleType") {
+    Seq(true, false).foreach { v =>
+      castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v)


Bug: Test for DoubleType incorrectly uses FloatType

The test named "cast StringType to DoubleType" actually tests DataTypes.FloatType instead of DataTypes.DoubleType. This appears to be a copy-paste error from the FloatType test immediately above it. As a result, DoubleType casting is not being tested at all, and the FloatType test is effectively run twice.

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! It looks like this is a copy/paste error. The test should use DoubleType as the test name suggests. Prevents regressions related to the support for DoubleType because it is not tested at the moment.

cursor · 2025-12-02T15:11:25Z

native/spark-expr/src/conversion_funcs/cast.rs

+        }
+        (Utf8 | LargeUtf8, Decimal256(precision, scale)) => {
+            cast_string_to_decimal(&array, to_type, precision, scale, eval_mode)
+        }


Bug: LargeUtf8 decimal conversion will fail at runtime

The match patterns include LargeUtf8 for decimal conversions, but the implementation functions cast_string_to_decimal128_impl and cast_string_to_decimal256_impl only downcast to StringArray. When a LargeUtf8 array (which is LargeStringArray) is passed, the downcast returns None and the function returns an internal error "Expected string array". This differs from the pattern used elsewhere (e.g., boolean conversion) which has separate arms for Utf8 and LargeUtf8.

Additional Locations (2)

native/spark-expr/src/conversion_funcs/cast.rs#L1079-L1083

native/spark-expr/src/conversion_funcs/cast.rs#L1128-L1132

value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! LargeUtf8 cannot be casted to StringArray, so the new code will need to use its parent trait - GenericStringArray. Prevents internal error at runtime when LargeUtf8 is passed.

martin-augment · 2025-12-03T07:07:06Z

Bug in test at line 690 🐛
test("cast StringType to DoubleType") {
  Seq(true, false).foreach { v =>
    castTest(specialValues.toDF("a"), DataTypes.FloatType, testAnsi = v)  // Should be DoubleType!
  }
}
This test is casting to FloatType instead of DoubleType, so it's not actually testing double conversion.

value:useful; category:bug; feedback:The Claude AI reviewer is correct! It looks like this is a copy/paste error. The test should use DoubleType as the test name suggests. Prevents regressions related to the support for DoubleType because it is not tested at the moment.

martin-augment · 2025-12-03T07:24:40Z

Missing function: is_validate_decimal_precision at cast.rs:1229
The code calls is_validate_decimal_precision(value, precision) but this function doesn't appear to be defined. Looking at similar code in the codebase (e.g., make_decimal.rs:63), you should use Arrow's built-in:
use arrow::datatypes::validate_decimal_precision;

// Then at line 1229:
if validate_decimal_precision(value, precision).is_ok() {
This will cause a compilation error. Please verify the build passes.

value:annoying; category:bug; feedback:The Claude AI reviewer is correct! is_validate_decimal_precision() is imported at line 35 (use arrow::datatypes::is_validate_decimal_precision).
https://github.com/apache/arrow-rs/blob/b93fa52e47a83dff130709a6e80a4be5017dbf09/arrow/src/datatypes/mod.rs#L28 re-exports it from arrow-data::decimal (https://github.com/apache/arrow-rs/blob/b93fa52e47a83dff130709a6e80a4be5017dbf09/arrow-data/src/decimal.rs#L1174)

github-actions · 2026-03-03T03:54:01Z

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

coderfender added 2 commits November 26, 2025 10:30

support_exhasutive_spark_float_decimal_casts

b9e48ec

support_exhasutive_spark_float_decimal_casts

c565a6d

augmentcode bot reviewed Dec 2, 2025

View reviewed changes

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

cursor bot reviewed Dec 2, 2025

View reviewed changes

github-actions bot added the Stale label Mar 3, 2026

Conversation

martin-augment commented Dec 2, 2025

Uh oh!

coderabbitai bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot commented Dec 2, 2025

PR Review: Support string to non-int numeric types casting

Summary

Code Quality & Best Practices ✅

Potential Bugs 🐛

Performance Considerations ⚡

Security Concerns 🔒

Test Coverage ✅

Documentation ✅

Action Items

Recommendation

Uh oh!

cursor bot Dec 2, 2025

Choose a reason for hiding this comment

Bug: Test for DoubleType incorrectly uses FloatType

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Dec 2, 2025

Choose a reason for hiding this comment

Bug: LargeUtf8 decimal conversion will fail at runtime

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Dec 3, 2025

Uh oh!

martin-augment commented Dec 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

coderabbitai bot commented Dec 2, 2025 •

edited

Loading