Skip to content

21322: feat: add cast_to_type UDF for type-based casting#300

Open
martin-augment wants to merge 4 commits intomainfrom
pr-21322-2026-04-05-20-09-11
Open

21322: feat: add cast_to_type UDF for type-based casting#300
martin-augment wants to merge 4 commits intomainfrom
pr-21322-2026-04-05-20-09-11

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

21322: To review by AI

adriangb and others added 4 commits April 2, 2026 10:13
Add a `cast_to_type(expression, reference)` function that casts
the first argument to the data type of the second argument, similar
to DuckDB's cast_to_type. The second argument's type (not value)
determines the target cast type, which is useful in macros and
generic SQL where types need to be preserved dynamically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `try_cast_to_type(expression, reference)` which works like
`cast_to_type` but returns NULL on cast failure instead of erroring,
similar to the relationship between arrow_cast and arrow_try_cast.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The second argument (reference) is used solely for its data type, so its
nullability should not propagate to the result.  Previously
`cast_to_type(42, NULL::INTEGER)` was incorrectly marked nullable in the
schema even though the input literal is non-null.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

Walkthrough

Two new scalar user-defined functions are introduced: cast_to_type and try_cast_to_type. Both accept an expression and a reference argument indicating the target data type. CastToTypeFunc and TryCastToTypeFunc implementations perform expression simplification to replace the function call with an explicit Expr::Cast or Expr::TryCast to the target type. Output field nullability is derived from the first argument's nullability for cast_to_type and is always nullable for try_cast_to_type. Comprehensive test coverage validates behavior across type combinations, edge cases, and expression contexts. Documentation and module registration are provided in the core functions system.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-21322-2026-04-05-20-09-11

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

PR Review: cast_to_type and try_cast_to_type UDFs

Overall this is a well-designed feature. The simplify-to-cast approach (mirroring arrow_cast/arrow_try_cast) is the right pattern for this kind of type-level function, and the documentation and test coverage are solid. A few issues worth addressing before merging.


Bug: Nullability inconsistency in try_cast_to_type same-type no-op

return_field_from_args unconditionally declares the result as nullable = true (because TryCast can return NULL on failure). However, simplify skips the TryCast wrapper when source_type == target_type, returning arg as-is:

let new_expr = if source_type == target_type {
    arg  // returned as-is; may be non-nullable
} else {
    Expr::TryCast(...)  // always nullable
};

For a non-nullable input like try_cast_to_type(42, 0::INTEGER), the declared return type says nullable = true but the simplified expression is the literal 42, which is non-nullable. This type mismatch can cause subtle downstream issues in plan analysis.

The fix for cast_to_type (latest commit) shows the right approach — try_cast_to_type needs the inverse treatment: either wrap in TryCast even for the same-type path (to preserve the nullable guarantee), or document that the simplified expression's actual nullability takes precedence after re-analysis.


Code duplication

cast_to_type.rs and try_cast_to_type.rs share ~85% identical code. The only meaningful differences are Expr::Cast vs Expr::TryCast, nullability in return_field_from_args, and the user_doc attributes. A shared helper (e.g., simplify_inner(arg, target_type, is_try) -> Expr) would reduce this significantly. Could be a follow-up.


Runtime fallback has no explanatory comment

Both invoke_with_args implementations always internal_err!. This is the right design (matching arrow_cast) but worth a comment explaining the invariant — if simplification is ever skipped, the error gives no clue why:

// This function is always simplified to Expr::Cast/TryCast during optimization.
// Reaching invoke_with_args means simplification was skipped, which should not happen.
internal_err!("cast_to_type should have been simplified to cast")

Minor: Missing error message assertion in test

statement error
SELECT cast_to_type('not_a_number', NULL::INTEGER);

The .slt framework supports statement error <regex>. Even statement error Cast error would catch message regressions.


Minor: Duplicate test case

The cast_to_type section has two identical queries — SELECT cast_to_type('42', NULL::INTEGER) appears under both # Basic string to integer cast and # NULL reference (type still applies). The second was presumably meant to show that a NULL-valued reference still extracts the type, but since both use NULL::INTEGER they are identical. Use a non-null reference like 0::INTEGER to make the distinction visible.


Summary

Category Finding
Bug try_cast_to_type same-type path returns non-nullable expr despite always-nullable declared return type
Code quality ~85% code duplication between the two implementation files
Robustness invoke_with_args error path lacks explanatory comment
Tests Error test does not assert message; one duplicate test case

The nullability bug is the main blocker. Everything else is secondary.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new scalar functions, cast_to_type and try_cast_to_type, which allow casting an expression to the data type of a reference expression. The implementation includes the UDF definitions, registration in the core module, comprehensive SQL logic tests, and updated user documentation. A review comment identifies a logic inconsistency in the simplify method of cast_to_type, where nullability is not correctly inherited from the source argument, potentially leading to incorrect schema inference.

// Use an actual cast to get the correct type
Expr::Cast(datafusion_expr::Cast {
expr: Box::new(arg),
field: target_type.into_nullable_field_ref(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The simplify implementation uses into_nullable_field_ref(), which always creates a nullable field for the Cast expression. This is inconsistent with the logic in return_field_from_args (lines 111-116), where nullability is inherited from the source argument (unless the target type is Null). This discrepancy can lead to incorrect schema inference or suboptimal query plans after simplification. The simplify method should use the same nullability logic as return_field_from_args by leveraging info.nullable(&arg).

Suggested change
field: target_type.into_nullable_field_ref(),
field: Field::new("value", target_type.clone(), info.nullable(&arg)? || target_type == DataType::Null).into(),

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! There was an issue in the return_field_from_args() method that the PR author improved in the last commit in the PR. But it seems the author didn't improve the same in the simplify() function. Prevents a wrong behavior and wrong result in the information schema for a view that uses this user defined function

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@datafusion/functions/src/core/cast_to_type.rs`:
- Around line 140-145: simplify() currently builds the simplified Expr::Cast
using into_nullable_field_ref(), forcing the cast to always be nullable and
causing a mismatch with return_field_from_args which computes nullability from
source_field.is_nullable() and whether target_type == DataType::Null; update
simplify() to compute nullable the same way (nullable =
source_field.is_nullable() || target_type == DataType::Null) and construct the
target field via Field::new(..., target_type, nullable).into() so the simplified
Expr::Cast and return_field_from_args (and CastExpr::nullable()) produce
consistent nullability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fe31b126-c6b6-4d36-84dc-c813925e11d7

📥 Commits

Reviewing files that changed from the base of the PR and between c990ba0 and 7891a1f.

📒 Files selected for processing (5)
  • datafusion/functions/src/core/cast_to_type.rs
  • datafusion/functions/src/core/mod.rs
  • datafusion/functions/src/core/try_cast_to_type.rs
  • datafusion/sqllogictest/test_files/cast_to_type.slt
  • docs/source/user-guide/sql/scalar_functions.md

Comment on lines +140 to +145
// Use an actual cast to get the correct type
Expr::Cast(datafusion_expr::Cast {
expr: Box::new(arg),
field: target_type.into_nullable_field_ref(),
})
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how DataFusion handles Cast expression nullability and whether
# into_nullable_field_ref vs computed nullability causes issues

# Search for how Cast's field nullability affects schema derivation
rg -n -A5 'impl.*Cast' --type rust
echo "---"
# Check if there are any tests that verify runtime nullability matches declared schema
rg -n 'into_nullable_field_ref' --type rust -C3

Repository: martin-augment/datafusion

Length of output: 36845


🏁 Script executed:

cat -n datafusion/functions/src/core/cast_to_type.rs

Repository: martin-augment/datafusion

Length of output: 6289


🏁 Script executed:

rg -n "impl.*Expr.*Cast" datafusion/expr/src/expr.rs -A 20 | head -100

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "pub struct Cast" datafusion/expr/src/expr.rs -A 10

Repository: martin-augment/datafusion

Length of output: 393


🏁 Script executed:

rg -n "fn.*data_type.*Cast\|Cast.*data_type" datafusion/expr/src/expr.rs -B 2 -A 8 | head -80

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "Cast =>" datafusion/expr/src/expr.rs -A 5 | head -60

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "expr.data_type\|field.data_type\|field.is_nullable" datafusion/expr/src/expr.rs -A 2 | head -100

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

ast-grep --pattern 'impl Expr {
  $$$
  pub fn data_type($_) -> $_ {
    $$$
  }
  $$$
}'

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "fn data_type" datafusion/expr/src/expr.rs | head -20

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "pub fn data_type\|pub fn field" datafusion/expr/src/ -A 3 | grep -A 3 "data_type\|field" | head -80

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "Expr::Cast" datafusion/expr/src/expr.rs -B 2 -A 5 | head -100

Repository: martin-augment/datafusion

Length of output: 3983


🏁 Script executed:

rg -n "get_data_type\|get_field\|output_ordering" datafusion/expr/src/ -A 3 | grep -B 2 -A 5 "Cast" | head -80

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "DynScalarUDF\|get_data_type" datafusion/expr/src/expr.rs -B 2 -A 8 | head -150

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "Cast.*field\|field.*Cast" datafusion/core/src/ --type rust -A 3 | head -100

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

rg -n "fn data_type\|fn field" datafusion/physical-expr/src/expressions/cast.rs -B 2 -A 10 | head -150

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

cat -n datafusion/physical-expr/src/expressions/cast.rs | head -250

Repository: martin-augment/datafusion

Length of output: 10920


🏁 Script executed:

cat -n datafusion/physical-expr/src/expressions/cast.rs | sed -n '241,260p'

Repository: martin-augment/datafusion

Length of output: 1133


Fix nullability inconsistency between type inference and simplification.

return_field_from_args (lines 107-117) correctly computes nullability based on the source field (source_field.is_nullable()) and whether the target type is Null. However, simplify() always uses into_nullable_field_ref() which unconditionally creates a nullable field. This causes the simplified Expr::Cast to always be nullable, even when a non-nullable input and non-null target type should produce non-nullable output.

The CastExpr::nullable() method (in physical-expr) computes nullability as child_nullable || target_nullable, so the target field's nullability directly affects the result. To maintain consistency, simplify() should compute the field's nullability the same way as return_field_from_args:

let nullable = source_field.is_nullable() || target_type == DataType::Null;
let field = Field::new("", target_type, nullable).into();

This ensures the declared schema and the simplified Cast expression produce the same nullability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/functions/src/core/cast_to_type.rs` around lines 140 - 145,
simplify() currently builds the simplified Expr::Cast using
into_nullable_field_ref(), forcing the cast to always be nullable and causing a
mismatch with return_field_from_args which computes nullability from
source_field.is_nullable() and whether target_type == DataType::Null; update
simplify() to compute nullable the same way (nullable =
source_field.is_nullable() || target_type == DataType::Null) and construct the
target field via Field::new(..., target_type, nullable).into() so the simplified
Expr::Cast and return_field_from_args (and CastExpr::nullable()) produce
consistent nullability.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! There was an issue in the return_field_from_args() method that the PR author improved in the last commit in the PR. But it seems the author didn't improve the same in the simplify() function. Prevents a wrong behavior and wrong result in the information schema for a view that uses this user defined function

@martin-augment
Copy link
Copy Markdown
Owner Author

augment review

@martin-augment
Copy link
Copy Markdown
Owner Author

Bug: Nullability inconsistency in try_cast_to_type same-type no-op

return_field_from_args unconditionally declares the result as nullable = true (because TryCast can return NULL on failure). However, simplify skips the TryCast wrapper when source_type == target_type, returning arg as-is:

let new_expr = if source_type == target_type {
    arg  // returned as-is; may be non-nullable
} else {
    Expr::TryCast(...)  // always nullable
};

For a non-nullable input like try_cast_to_type(42, 0::INTEGER), the declared return type says nullable = true but the simplified expression is the literal 42, which is non-nullable. This type mismatch can cause subtle downstream issues in plan analysis.

The fix for cast_to_type (latest commit) shows the right approach — try_cast_to_type needs the inverse treatment: either wrap in TryCast even for the same-type path (to preserve the nullable guarantee), or document that the simplified expression's actual nullability takes precedence after re-analysis.

value:incorrect-but-reasonable; category:bug; feedback: The Claude AI reviewer is not correct! The try_cast_to_type() function is documented as returning a nullable result. It won't be NULL if the cast operation is successful - the skipped cast is a success. It will return NULL only if the cast fails.

@martin-augment
Copy link
Copy Markdown
Owner Author

Code duplication

cast_to_type.rs and try_cast_to_type.rs share ~85% identical code. The only meaningful differences are Expr::Cast vs Expr::TryCast, nullability in return_field_from_args, and the user_doc attributes. A shared helper (e.g., simplify_inner(arg, target_type, is_try) -> Expr) would reduce this significantly. Could be a follow-up.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! There is a lot of duplicated code that could be extracted to a helper struct or a trait to prevent regressions by fixing an issue only at one place and missing it at another.

@martin-augment
Copy link
Copy Markdown
Owner Author

Runtime fallback has no explanatory comment

Both invoke_with_args implementations always internal_err!. This is the right design (matching arrow_cast) but worth a comment explaining the invariant — if simplification is ever skipped, the error gives no clue why:

// This function is always simplified to Expr::Cast/TryCast during optimization.
// Reaching invoke_with_args means simplification was skipped, which should not happen.
internal_err!("cast_to_type should have been simplified to cast")

value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! The invoke_with_args() function is not called due to the simplification of the expression. It would be good to add a comment mentioning this, so the next developer reading the code knows why.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Apr 6, 2026

🤖 Augment PR Summary

Summary: This PR adds two new scalar UDFs, cast_to_type and try_cast_to_type, to support casting an expression based on the data type of a “reference” expression.

Changes:

  • Introduces CastToTypeFunc and TryCastToTypeFunc implementations that rewrite calls during optimization into explicit Expr::Cast / Expr::TryCast.
  • Registers both UDFs in datafusion/functions/src/core/mod.rs and exposes them via expr_fn helpers.
  • Adds a dedicated sqllogictest file covering basic casting behavior, nested usage, column/subquery reference arguments, error vs NULL-on-failure semantics, and information_schema nullability expectations.
  • Documents both functions in the SQL scalar functions user guide, including syntax, arguments, and examples.

Technical Notes: The “reference” argument’s value is ignored; only its inferred type is used to determine the cast target type. The functions are marked immutable and rely on planner/optimizer simplification rather than runtime evaluation.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

arg
} else {
// Use an actual cast to get the correct type
Expr::Cast(datafusion_expr::Cast {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/functions/src/core/cast_to_type.rs:141: return_field_from_args special-cases DataType::Null to force nullable=true, but simplify rewrites to Expr::Cast, whose field nullability is derived from the input expression. If the reference arg is a bare NULL (type Null) and the source is non-nullable, this rewrite can yield a non-nullable Null output schema (and diverge from the UDF’s declared schema).

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

let arg = args.pop().unwrap();

let source_type = info.get_data_type(&arg)?;
let new_expr = if source_type == target_type {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/functions/src/core/try_cast_to_type.rs:121: When source_type == target_type, simplify returns arg directly, but return_field_from_args always marks the result as nullable. This can make the simplified expression’s nullability differ from the original UDF’s declared nullability, which DataFusion’s ScalarUDFImpl::simplify docs warn can cause schema verification issues.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

let arg = args.pop().unwrap();

let source_type = info.get_data_type(&arg)?;
let new_expr = if source_type == target_type {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In datafusion/functions/src/core/try_cast_to_type.rs:121, simplify returns the original arg when source_type == target_type, which means the optimized expression is no longer an Expr::TryCast and can lose the “always nullable / can return NULL on failure” behavior implied by return_field_from_args (and Expr::TryCast nullability).

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants