21212: feat: spark compatible float to timestamp cast with ANSI support by martin-augment · Pull Request #297 · martin-augment/datafusion

martin-augment · 2026-03-29T20:23:07Z

21212: To review by AI

coderabbitai · 2026-03-29T20:23:36Z

Walkthrough

This change extends the Spark CAST function to support casting float types (Float32 and Float64) to timestamp values. The implementation adds float-to-microseconds conversion with special handling for IEEE floating-point values (NaN and Infinity) and overflow conditions, where these cases produce NULL in non-ANSI mode and raise errors in ANSI mode. The enable_ansi_mode configuration is threaded through the casting path to control this behavior. Scalar and array inputs are supported. Comprehensive test coverage is added through a new SQLLogicTest file and extended unit tests, validating correct conversions for various input values and both ANSI and non-ANSI execution modes.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-21212-2026-03-29-20-23-02

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-03-29T20:25:08Z

PR Review: Spark-compatible float to timestamp cast with ANSI support

Overall this is a well-structured addition that correctly implements float→timestamp semantics with ANSI mode support. The test coverage is broad. A few issues need attention before merging.

🔴 Will break CI: `clippy::collapsible_else_if`

In float_secs_to_micros (cast.rs ~line 60), the else { if ... } pattern will fail the cargo clippy -- -D warnings check required by CLAUDE.md:

// Current — triggers collapsible_else_if
} else {
    if enable_ansi_mode {
        return exec_err!("Overflow casting {} to TIMESTAMP", val);
    }
    Ok(None)
}

Should be collapsed:

} else if enable_ansi_mode {
    exec_err!("Overflow casting {} to TIMESTAMP", val)
} else {
    Ok(None)
}

🔴 Subtle overflow check bug (ANSI mode)

The bounds check in float_secs_to_micros:

if micros >= i64::MIN as f64 && micros <= i64::MAX as f64 {
    Ok(Some(micros as i64))

i64::MAX (9223372036854775807) is not exactly representable as f64 — it rounds up to 9223372036854775808.0. So values in the range (i64::MAX, 9223372036854775808.0] will pass the check, and micros as i64 will saturate to i64::MAX. In non-ANSI mode saturation is acceptable, but in ANSI mode this silently swallows overflows that should be errors. Use a strict-less-than for the upper bound:

if micros >= i64::MIN as f64 && micros < i64::MAX as f64 {

(i64::MIN = -9223372036854775808 is exactly representable, so >= is fine for the lower bound.)

🟡 Error message doesn't distinguish `-Infinity`

In float_secs_to_micros, both Infinity and -Infinity produce:

Cannot cast Infinity to TIMESTAMP

The test test_cast_float_infinity_ansi_mode only covers positive infinity. Consider:

if val.is_nan() { "NaN" } else if val.is_sign_positive() { "Infinity" } else { "-Infinity" }

And add a test for negative infinity in ANSI mode.

🟡 Doc comment on `SparkCast` not updated

The struct's doc comment still reads:

/// # Currently supported conversions
/// - Int8/Int16/Int32/Int64 -> Timestamp (target_type = 'timestamp')

Float32/Float64 should be added there, and the overflow behaviour section should mention float semantics (NaN/Infinity/saturation vs. ANSI errors) rather than only integer saturation.

🟡 Potentially brittle test assertion

In test_cast_float64_array_to_timestamp:

Some(1704067200.123456),
// ...
assert_eq!(ts_array.value(3), 1_704_067_200_123_456);

1704067200.123456 has 16 significant digits, which is at the limit of f64 precision (~15–17 digits). The product 1704067200.123456 * 1_000_000.0 may not equal exactly 1704067200123456. Verify this is stable on all platforms, or use a value with fewer significant digits that is provably exact in f64.

✅ Positives

The ANSI vs. non-ANSI mode split (NULL vs. error) matches Spark's specification.
cast_float_to_timestamp correctly uses the T::Native: Into<f64> bound so Float32→f64 widening is handled generically.
ScalarValue path handles Float32(None) / Float64(None) as None correctly.
The sqllogictest file covers scalar, array, timezone, and ANSI-mode paths comprehensively.
Missing Float32 NaN/Infinity tests in the unit tests (the SLT only tests Float64 for NaN/Inf) — worth adding a test_cast_float32_nan_non_ansi_mode unit test for completeness.

gemini-code-assist

Code Review

This pull request implements casting from float types (Float32 and Float64) to timestamps in the Spark-compatible cast function, including handling for fractional seconds, NaN, infinity, and overflow with support for ANSI mode. It also adds corresponding unit and logic tests. Review feedback suggests improving error message clarity by distinguishing between positive and negative infinity and refactoring array processing to use idiomatic Rust iterators.

gemini-code-assist · 2026-03-29T20:25:09Z

datafusion/spark/src/function/conversion/cast.rs

+        if enable_ansi_mode {
+            return exec_err!(
+                "Cannot cast {} to TIMESTAMP",
+                if val.is_nan() { "NaN" } else { "Infinity" }


The error message for infinity does not distinguish between positive and negative infinity. This could be confusing for users. Consider providing a more specific message to improve clarity.

It would also be beneficial to add a test case for casting f64::NEG_INFINITY in ANSI mode to ensure it's handled correctly and provides the improved error message.

Suggested change

if val.is_nan() { "NaN" } else { "Infinity" }

if val.is_nan() { "NaN" } else if val.is_sign_positive() { "Infinity" } else { "-Infinity" }

gemini-code-assist · 2026-03-29T20:25:09Z

datafusion/spark/src/function/conversion/cast.rs

+    for i in 0..arr.len() {
+        if arr.is_null(i) {
+            builder.append_null();
+        } else {
+            let val: f64 = arr.value(i).into();
+            match float_secs_to_micros(val, enable_ansi_mode)? {
+                Some(micros) => builder.append_value(micros),
+                None => builder.append_null(),
+            }
+        }
+    }


The loop can be written more idiomatically using an iterator over the array. This can make the code more concise and easier to read by leveraging standard Rust patterns.

for val in arr.iter() { let micros = if let Some(val) = val { float_secs_to_micros(val.into(), enable_ansi_mode)? } else { None }; builder.append_option(micros); }

augmentcode · 2026-03-29T20:28:04Z

🤖 Augment PR Summary

Summary: Adds Spark-compatible casting from floating-point seconds to microsecond timestamps, with behavior gated by ANSI mode.

Changes:

Extends spark_cast input signature to accept Float32 and Float64 in addition to signed integers
Introduces float-seconds → microseconds conversion with explicit handling for NaN/Infinity and overflow
Threads execution.enable_ansi_mode into timestamp casting to switch between NULL-on-invalid vs error-on-invalid behavior
Adds array casting implementation for float inputs producing Timestamp(Microsecond) arrays
Updates scalar casting path to support ScalarValue::Float32/Float64
Adds Rust unit tests covering float arrays/scalars, negative values, NaN/Infinity, and overflow in ANSI vs non-ANSI
Adds a new SQLLogicTest file validating SQL-level behavior, timezone effects, and ANSI-mode error cases

Technical Notes: Float inputs are interpreted as seconds since Unix epoch (fractional seconds supported) and converted to microsecond timestamps; special values (NaN/Infinity) are NULL in non-ANSI mode and errors in ANSI mode.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 4 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-03-29T20:28:04Z

datafusion/spark/src/function/conversion/cast.rs

+        if enable_ansi_mode {
+            return exec_err!(
+                "Cannot cast {} to TIMESTAMP",
+                if val.is_nan() { "NaN" } else { "Infinity" }


datafusion/spark/src/function/conversion/cast.rs:52: In ANSI mode, -Infinity will be reported as Infinity here because the error text doesn’t preserve the sign, which can make failures harder to diagnose.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-03-29T20:28:04Z

datafusion/spark/src/function/conversion/cast.rs

+        return Ok(None);
+    }
+    let micros = val * MICROS_PER_SECOND as f64;
+    if micros >= i64::MIN as f64 && micros <= i64::MAX as f64 {


datafusion/spark/src/function/conversion/cast.rs:58: The bounds check uses i64::MAX as f64 (rounded to 2^63), so a value that rounds to exactly 2^63 microseconds can pass this check and then micros as i64 will saturate to i64::MAX, bypassing the intended overflow error/NULL behavior (including in ANSI mode).

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-03-29T20:28:04Z

datafusion/spark/src/function/conversion/cast.rs

+        if enable_ansi_mode {
+            return exec_err!("Overflow casting {} to TIMESTAMP", val);
+        }
+        Ok(None)


datafusion/spark/src/function/conversion/cast.rs:64: Spark’s non-ANSI float/double→timestamp cast appears to saturate on overflow (via (d * MICROS_PER_SECOND).toLong), so returning NULL on overflow here may diverge from Spark compatibility for very large inputs.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

augmentcode · 2026-03-29T20:28:04Z

datafusion/spark/src/function/conversion/cast.rs


-        // Spark only supports signed integers, so we explicitly list them
-        let signed_int_signatures = [
+        // Supported input types: signed integers and floats


datafusion/spark/src/function/conversion/cast.rs:112: The spark_cast doc comment still lists only integer→timestamp and describes saturating overflow, but this PR adds float→timestamp with distinct NaN/Infinity/overflow behavior, so the documentation is now inconsistent with the implementation.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-29T20:33:10Z

datafusion/spark/src/function/conversion/cast.rs

    }

    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
+        let enable_ansi_mode = args.config_options.execution.enable_ansi_mode;


Float-to-timestamp can produce NULLs despite non-nullable return field

Medium Severity

return_field_from_args computes nullable solely from whether input fields are nullable. With the new float-to-timestamp conversion, float_secs_to_micros returns None (producing NULLs) for NaN, Infinity, and overflow values in non-ANSI mode — even when the input float column is declared non-nullable. This creates a mismatch: the output field claims to be non-nullable, but the actual data may contain NULLs. Downstream optimizations that skip null checks based on this metadata could produce incorrect results.

Additional Locations (1)

datafusion/spark/src/function/conversion/cast.rs#L46-L66

cursor · 2026-03-29T20:33:10Z

datafusion/spark/src/function/conversion/cast.rs

+            return exec_err!(
+                "Cannot cast {} to TIMESTAMP",
+                if val.is_nan() { "NaN" } else { "Infinity" }
+            );


Error message misidentifies negative infinity as positive

Low Severity

When val is f64::NEG_INFINITY, val.is_nan() is false, so the error message falls through to the else branch producing "Cannot cast Infinity to TIMESTAMP". The actual value is -Infinity, so the error message provides incorrect diagnostic information. The ternary doesn't distinguish positive from negative infinity.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

datafusion/spark/src/function/conversion/cast.rs (1)

345-356: ⚠️ Potential issue | 🟡 Minor

Test helper panics if timezone is None.

The function signature accepts Option<&str> but line 354 calls timezone.unwrap() unconditionally. This will panic if None is passed. Consider either:

Changing the signature to timezone: &str since None is not supported, or
Handling None gracefully with .map(Arc::from)

This is test-only code and all current callers pass Some(...), so impact is low.

Proposed fix (option 1 - change signature)

 fn make_args_with_timezone(
     input: ColumnarValue,
     target_type: &str,
-    timezone: Option<&str>,
+    timezone: &str,
 ) -> ScalarFunctionArgs {
     let return_field = Arc::new(Field::new(
         "result",
         DataType::Timestamp(
             TimeUnit::Microsecond,
-            Some(Arc::from(timezone.unwrap())),
+            Some(Arc::from(timezone)),
         ),
         true,
     ));
     let mut config = ConfigOptions::default();
-    if let Some(tz) = timezone {
-        config.execution.time_zone = Some(tz.to_string());
-    }
+    config.execution.time_zone = Some(timezone.to_string());

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@datafusion/spark/src/function/conversion/cast.rs` around lines 345 - 356, The
helper make_args_with_timezone currently unwraps timezone unconditionally
causing a panic when timezone is None; either change the signature to require
timezone: &str and update all callers to pass a str, or keep timezone:
Option<&str> and construct the Timestamp field with timezone.map(Arc::from)
(i.e. pass Some(Arc::from(...)) when present and None otherwise) so the
Arc::from(timezone.unwrap()) call in the creation of return_field (and any
places using timezone) is replaced with a safe Option<Arc<str>> conversion.

🧹 Nitpick comments (1)

datafusion/spark/src/function/conversion/cast.rs (1)
57-59: Minor precision edge case in overflow boundary check.

The comparison micros <= i64::MAX as f64 has a subtle precision issue: i64::MAX as f64 rounds up to 9223372036854775808.0 (one more than i64::MAX) due to f64's limited precision. Values equal to this threshold could pass the check but would saturate when cast to i64.

This is benign in practice—Rust 1.45+ defines saturating behavior for such casts, and this edge case represents timestamps billions of years in the future. The current behavior is also consistent with the saturating approach used for integer overflow in secs_to_micros.

No action required unless stricter boundary enforcement is desired.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/spark/src/function/conversion/cast.rs` around lines 57 - 59, The
check against the i64 upper bound can mispass due to f64 rounding of i64::MAX;
update the condition in the cast logic that uses micros (and mirror the same
change in secs_to_micros if present) to use a strict less-than or subtract one
unit before comparison — e.g. replace `micros <= i64::MAX as f64` with `micros <
i64::MAX as f64` or `micros <= (i64::MAX as f64) - 1.0` so values that round up
to 9223372036854775808.0 don't erroneously pass and then saturate when cast to
i64. Ensure you reference the `micros` variable and `MICROS_PER_SECOND` in the
same block when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@datafusion/spark/src/function/conversion/cast.rs`:
- Around line 345-356: The helper make_args_with_timezone currently unwraps
timezone unconditionally causing a panic when timezone is None; either change
the signature to require timezone: &str and update all callers to pass a str, or
keep timezone: Option<&str> and construct the Timestamp field with
timezone.map(Arc::from) (i.e. pass Some(Arc::from(...)) when present and None
otherwise) so the Arc::from(timezone.unwrap()) call in the creation of
return_field (and any places using timezone) is replaced with a safe
Option<Arc<str>> conversion.

---

Nitpick comments:
In `@datafusion/spark/src/function/conversion/cast.rs`:
- Around line 57-59: The check against the i64 upper bound can mispass due to
f64 rounding of i64::MAX; update the condition in the cast logic that uses
micros (and mirror the same change in secs_to_micros if present) to use a strict
less-than or subtract one unit before comparison — e.g. replace `micros <=
i64::MAX as f64` with `micros < i64::MAX as f64` or `micros <= (i64::MAX as f64)
- 1.0` so values that round up to 9223372036854775808.0 don't erroneously pass
and then saturate when cast to i64. Ensure you reference the `micros` variable
and `MICROS_PER_SECOND` in the same block when making the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20bef460-ebba-43fb-81ed-bfdd7983a901

📥 Commits

Reviewing files that changed from the base of the PR and between 49d9c89 and a75281c.

📒 Files selected for processing (2)

datafusion/spark/src/function/conversion/cast.rs
datafusion/sqllogictest/test_files/spark/conversion/cast_float_to_timestamp.slt

coderfender and others added 2 commits March 27, 2026 21:04

feat_float_to_timestamp_cast

f06cd57

Merge branch 'main' into df_spark_float_to_timestamp

a75281c

gemini-code-assist bot reviewed Mar 29, 2026

View reviewed changes

augmentcode bot reviewed Mar 29, 2026

View reviewed changes

cursor bot reviewed Mar 29, 2026

View reviewed changes

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

	if val.is_nan() { "NaN" } else { "Infinity" }
	if val.is_nan() { "NaN" } else if val.is_sign_positive() { "Infinity" } else { "-Infinity" }

Conversation

martin-augment commented Mar 29, 2026

Uh oh!

coderabbitai bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Mar 29, 2026

PR Review: Spark-compatible float to timestamp cast with ANSI support

🔴 Will break CI: clippy::collapsible_else_if

🔴 Subtle overflow check bug (ANSI mode)

🟡 Error message doesn't distinguish -Infinity

🟡 Doc comment on SparkCast not updated

🟡 Potentially brittle test assertion

✅ Positives

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Mar 29, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 29, 2026

Choose a reason for hiding this comment

Float-to-timestamp can produce NULLs despite non-nullable return field

Uh oh!

cursor bot Mar 29, 2026

Choose a reason for hiding this comment

Error message misidentifies negative infinity as positive

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 29, 2026 •

edited

Loading

🔴 Will break CI: `clippy::collapsible_else_if`

🟡 Error message doesn't distinguish `-Infinity`

🟡 Doc comment on `SparkCast` not updated