20426: fix: Prefer numeric in type coercion for comparisons by martin-augment · Pull Request #294 · martin-augment/datafusion

martin-augment · 2026-03-25T21:13:59Z

20426: To review by AI

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

coderabbitai · 2026-03-25T21:14:19Z

Walkthrough

This pull request refactors type coercion logic in DataFusion to distinguish between comparison and type-union contexts. A new type_union_coercion function handles type unification for operations like UNION, CASE expressions, and NVL2, preferring string types. The existing comparison_coercion function is updated to prefer numeric types when comparing mixed numeric and string operands. Related coercion helpers like dictionary_coercion and ree_coercion are generalized to accept a coercion strategy parameter. Callers throughout the codebase are updated to invoke the appropriate coercion function for their context. Test expectations are modified to reflect numeric-preferred comparison semantics, and new test coverage is added for the updated coercion behavior.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-20426-2026-03-25-21-13-54

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-25T21:14:32Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines DataFusion's type coercion system, particularly for operations involving mixed numeric and string data types. The primary goal is to ensure that comparison operations (like filtering or joining) correctly interpret string representations of numbers as numeric values, preventing unexpected lexicographical comparisons. Concurrently, it clarifies and separates the coercion logic for type unification scenarios (like UNION or CASE result branches), where string preference is generally more appropriate. This change leads to more intuitive and SQL-compliant behavior, while also introducing stricter type checking that will now surface errors for invalid string-to-numeric casts.

Highlights

Type Coercion Logic Refinement: The type coercion rules for comparison operations (e.g., =, >, IN, BETWEEN, CASE WHEN) have been updated to prefer numeric types over string types when comparing mixed numeric and string values. This ensures comparisons are performed numerically rather than lexicographically.
Introduction of type_union_coercion: A new coercion function, type_union_coercion, has been introduced for contexts where type unification is needed (e.g., UNION, CASE THEN/ELSE, NVL2). This function prefers string types when unifying mixed numeric and string values, aligning with standard SQL behavior for these operations.
Refactored Coercion Functions: The comparison_coercion_numeric function has been removed, and its logic is now integrated into the main comparison_coercion function. Generic coercion functions for struct, map, dictionary, and RunEndEncoded types were refactored to accept a coercion strategy, allowing them to be used with both comparison-based and union-based coercion.
Enhanced Test Coverage and Documentation: Extensive new SQL logic tests (string_numeric_coercion.slt) have been added to validate the new coercion behavior across various scenarios, including error cases for invalid casts. The upgrade guide (53.0.0.md) has been updated with a detailed explanation of the changes, affected queries, and migration advice.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2026-03-25T21:15:57Z

PR Review: `fix: Prefer numeric in type coercion for comparisons`

This is a well-executed fix for a genuine semantic bug. The core issue — string-based lexicographic comparison causing 5 > '100' to evaluate as true — is a real correctness problem worth fixing. Overall the PR is in good shape. Below are my observations.

What the PR Does

The PR introduces a distinction between two coercion contexts:

Comparison contexts (=, <, >, IN, BETWEEN, CASE WHEN): now prefer numeric types
Type unification contexts (UNION, CASE THEN/ELSE, NVL2): continue to prefer string types

This fixes incorrect lexicographic comparison behavior while preserving sensible behavior for type unification.

Correctness / Potential Issues

1. Intentional breaking changes are significant but clearly documented.

Queries like str_col IN ('a', 1) and int_col = 'hello' that previously succeeded (with silent coercion to string) will now error. This is the correct behavior, but users should be explicitly warned in the PR description and migration notes. ✅ The migration guide in upgrading/53.0.0.md covers this well.

2. Dictionary and RunEndEncoded type coercion flip.

Dictionary(Int32, Utf8) compared to Int16 now coerces to Int16 rather than Utf8. This is semantically correct but is a silent behavior change — not an error, just different results. Consider whether a warning or stricter check is warranted here, since dictionary-encoded string columns compared to integers could indicate a schema design issue.

3. Removed tests in push_down_filter_parquet.slt.

Several tests for non-castable string literals were removed rather than updated to assert the new error behavior. It would be better to convert them to error-assertion tests to maintain coverage of those code paths.

4. in_list_types test rename and behavior change.

The old test in_list_types tested a successful plan; the new in_list_types_mixed_string_int_error tests that it errors. This is correct, but consider adding a new passing test that confirms int_col IN (1, 2, 3) (pure numeric IN list) still works as expected after the refactor.

Code Quality

The refactoring is clean. The old pair (comparison_coercion / comparison_coercion_numeric) has been reorganized into a clear intent-driven pair (comparison_coercion for numeric-preferring, type_union_coercion for string-preferring).
Using function pointers in dictionary_coercion, struct_coercion, map_coercion, ree_coercion to avoid duplicating logic is a good design choice.
Comments explaining why different coercion strategies are used in different contexts are clear and helpful.

One minor nit: the naming type_union_coercion could be confused with SQL UNION type coercion specifically, but it's used for any context that prefers string widening. A name like widening_coercion or string_preferring_coercion might be clearer — though this is a minor style point.

Test Coverage

The new string_numeric_coercion.slt file (496 lines) is comprehensive:

Covers <, >, <=, >=, =, !=
Includes error cases (non-numeric strings, overflow, decimal vs. int)
Tests UNION, BETWEEN, IN, CASE WHEN contexts
Tests complex types: structs, lists, maps, dictionaries, run-end-encoded

The unit tests in comparison.rs (test_comparison_coercion_prefers_numeric, test_type_union_coercion_prefers_string, test_binary_comparison_string_numeric_coercion) round this out well.

Gap: There are no tests for NULL-handling edge cases (e.g., NULL IN (1, 'a') or NULL > '5') in the new coercion contexts. These are worth verifying.

Performance

No concerns. Type coercion happens at plan time, and the changes don't affect execution kernels. The refactored code is if anything slightly simpler.

Documentation

The upgrading/53.0.0.md migration guide is clear and actionable. Good work including a before/after behavior table.

Summary

Area	Assessment
Correctness	✅ Fixes a real semantic bug
Breaking changes	⚠️ Intentional and documented, but significant
Code quality	✅ Clean refactor with good separation of concerns
Test coverage	✅ Comprehensive, minor gap in NULL edge cases
Documentation	✅ Migration guide is clear
Performance	✅ No concerns

Suggested action items before merge:

Convert removed push_down_filter_parquet.slt tests to error-assertion form instead of deleting them.
Add a NULL + mixed-type IN/comparison test case to string_numeric_coercion.slt.
Consider whether Dictionary(Utf8) vs Int silent behavior change needs a deprecation note or log warning.

Otherwise the change is solid.

augmentcode · 2026-03-25T21:19:48Z

🤖 Augment PR Summary

Summary: This PR changes type coercion so comparisons between numeric and string values prefer numeric semantics, fixing incorrect lexicographic ordering/equality.

Changes:

Updates comparison coercion to cast the string side to the numeric type for =, <, >, <=, >=, <>, IN, BETWEEN, and CASE expr WHEN contexts
Introduces type_union_coercion for type-unification contexts (e.g., UNION, CASE THEN/ELSE results, NVL2) where strings remain preferred
Refactors struct/map/dictionary/REE coercion to be parameterized over the chosen coercion function
Updates optimizer/analyzer coercion paths for CASE and UNION schema coercion to use the appropriate rule
Adjusts/expands tests, including new sqllogictest coverage for string-numeric comparisons and nested types
Updates Clickbench unparser test setup to register raw parquet then create a view with corrected EventDate typing
Updates docs (53.0.0 upgrade guide) describing the behavioral change and the removal of comparison_coercion_numeric

Technical Notes: Mixed string/numeric comparisons may now raise cast errors at runtime when strings are not parseable as the target numeric type; UNION/branch unification continues to prefer strings via type_union_coercion.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-03-25T21:19:48Z

datafusion/expr-common/src/type_coercion/binary.rs

+        .or_else(|| ree_coercion(lhs_type, rhs_type, true, type_union_coercion))
        .or_else(|| temporal_coercion_nonstrict_timezone(lhs_type, rhs_type))
        .or_else(|| string_coercion(lhs_type, rhs_type))
        .or_else(|| list_coercion(lhs_type, rhs_type))


In type_union_coercion, list types are handled via list_coercion, whose child-type resolution uses type_union_resolution (numeric-preferring for string/numeric). This may cause UNION/CASE-THEN coercion for List(Int64) vs List(Utf8) to pick List(Int64) and error on non-numeric strings, which seems to contradict the string-preferring intent described above.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! type_union_resolution() prefers numbers than strings and this affects how the list_coercion() behaves too. Prevents wrong behavior for list related operations.

augmentcode · 2026-03-25T21:19:49Z

datafusion/sqllogictest/test_files/string_numeric_coercion.slt

+----
+[20, 30]
+
+# List in UNION uses type union coercion (string preferred).


This UNION test for list types doesn't assert the resulting element type, and [5, 10] formatting could pass even if the union coerces to List(Int64) by casting strings. Consider asserting via arrow_typeof on the UNION output (or adding a non-numeric string case) to ensure list UNION truly uses string-preferred coercion.

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The SQL Logic test claims that the result is List(Utf8) but does no assert that and there is no way to confirm this and to prevent any regressions.

augmentcode · 2026-03-25T21:19:49Z

datafusion/core/tests/expr_api/mod.rs

+    // int column vs string literal: the string literal is cast to Int64
+    create_expr_test(col("i").eq(lit("202410")), "i@1 = CAST(202410 AS Int64)");
+    create_expr_test(lit("202410").eq(col("i")), "CAST(202410 AS Int64) = i@1");
+    // when simplified, the literal cast is constant-folded


The comment says the literal cast is constant-folded in the simplified expression, but the expected simplified string still includes CAST(202410 AS Int64). If folding is intended, this test may not currently validate it (or the comment may be outdated).

Severity: low

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The comment claims that the result is folded into a constant but the unit test result still uses CAST(...), so the comment is not correct and needs to be improved.

gemini-code-assist

Code Review

This pull request introduces a significant change to DataFusion's type coercion rules, distinguishing between 'comparison' contexts (e.g., =, >, IN, BETWEEN, CASE WHEN) where numeric types are now preferred, and 'type unification' contexts (e.g., UNION, CASE THEN/ELSE, NVL2) where string types are preferred. This refactoring ensures more intuitive and correct numeric semantics for mixed-type comparisons, raising errors when string literals cannot be cast to the numeric type. The changes involve updating core coercion functions, adapting their usage across the expression and physical planning layers, and updating extensive tests and documentation to reflect the new behavior.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-25T21:20:48Z

datafusion/expr-common/src/type_coercion/binary.rs


-/// Coercion rules for list types.
+/// Coerce two list types by coercing their element types via
+/// [`type_union_resolution`].


List coercion not parameterized unlike struct and map

Medium Severity

struct_coercion and map_coercion were correctly parameterized with a coerce_fn to distinguish comparison vs. union contexts, but list_coercion was not. It always delegates to type_union_resolution, which internally calls string_numeric_coercion (numeric-preferring). This means list element coercion in type_union_coercion contexts (UNION, CASE THEN/ELSE) incorrectly uses numeric preference instead of string preference — opposite to how struct and map behave in the same context.

Additional Locations (1)

datafusion/expr-common/src/type_coercion/binary.rs#L1626-L1637

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

datafusion/sqllogictest/test_files/string/string_query.slt.part (1)

50-54: Consider loosening error-text matching to reduce test brittleness.

Matching the exact bad value ('Andrew') can make this test fragile if evaluation order changes. A regex that asserts cast-failure class/message is usually more stable.

♻️ Suggested test expectation hardening

-statement error Arrow error: Cast error: Cannot cast string 'Andrew' to value of Int64 type
+statement error .*Cannot cast string '.*' to value of Int64 type
 select ascii_1 = 1 as col1, 1 = ascii_1 as col2 from test_basic_operator;
 
-statement error Arrow error: Cast error: Cannot cast string 'Andrew' to value of Int64 type
+statement error .*Cannot cast string '.*' to value of Int64 type
 select ascii_1 <> 1 as col1, 1 <> ascii_1 as col2 from test_basic_operator;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@datafusion/sqllogictest/test_files/string/string_query.slt.part` around lines
50 - 54, The test currently matches the full error text including the specific
bad value "'Andrew'", which is brittle; update the expectations in
string_query.slt.part for the two failing statements (the ones starting with
"select ascii_1 = 1 as col1, 1 = ascii_1 as col2 from test_basic_operator;" and
"select ascii_1 <> 1 as col1, 1 <> ascii_1 as col2 from test_basic_operator;")
to use a looser regex or substring that asserts the cast-failure class/message
(e.g. match "Arrow error: Cast error" and/or "Cannot cast .* to value of Int64"
without the concrete literal) so the test no longer depends on the exact value
text.

datafusion/sqllogictest/test_files/string_numeric_coercion.slt (1)

491-496: Make these two negative cases assert the coercion failure, not just any failure.

Bare query error will pass on unrelated parser or planner errors too. Since this file is pinning coercion behavior, please match a stable substring here like the array case above.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/sqllogictest/test_files/string_numeric_coercion.slt` around lines
491 - 496, Replace the bare "query error" lines before the two MAP tests with a
targeted failure assertion that matches the coercion failure (not any
parser/planner error); for the queries SELECT MAP {[1,2,3]:1, ['a', 'b']:2}; and
SELECT MAP {'a':[1,2,3], 'b':['a', 'b']}; change the preceding marker to
something like "query error: cannot coerce" (or another stable substring used
elsewhere for array/type coercion failures) so the test asserts a
coercion-specific error.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@datafusion/expr-common/src/type_coercion/binary.rs`:
- Line 750: The call to string_numeric_coercion in
type_union_resolution_coercion should be replaced with
string_numeric_union_coercion so multi-type union resolution prefers string
types for numeric/string pairs; update the expression chain that currently calls
string_numeric_coercion (the .or_else(...) branch) to call
string_numeric_union_coercion instead, ensuring behavior matches
type_union_coercion's documented string-preference policy.

In `@datafusion/optimizer/src/analyzer/type_coercion.rs`:
- Around line 44-50: The SetComparison branch (Expr::SetComparison) still
applies the old numeric/string early-return guard and must instead use the new
comparison-vs-union semantics: remove the numeric/string early-return in the
Expr::SetComparison handling and invoke the same comparison_coercion (or
type_union_coercion where appropriate) logic used by plain
comparisons/IN/BETWEEN so = ANY / < ALL follow the new coercion paths; update
the branch to call the comparison coercion helpers (e.g., comparison_coercion or
type_union_coercion) and handle UDF/field coercions consistently
(UDFCoercionExt, fields_with_udf) and add a regression test that constructs a
mixed numeric/string subquery comparison (e.g., value = ANY(subquery with mixed
numeric and string types) ) verifying it now coerces instead of erroring.

---

Nitpick comments:
In `@datafusion/sqllogictest/test_files/string_numeric_coercion.slt`:
- Around line 491-496: Replace the bare "query error" lines before the two MAP
tests with a targeted failure assertion that matches the coercion failure (not
any parser/planner error); for the queries SELECT MAP {[1,2,3]:1, ['a', 'b']:2};
and SELECT MAP {'a':[1,2,3], 'b':['a', 'b']}; change the preceding marker to
something like "query error: cannot coerce" (or another stable substring used
elsewhere for array/type coercion failures) so the test asserts a
coercion-specific error.

In `@datafusion/sqllogictest/test_files/string/string_query.slt.part`:
- Around line 50-54: The test currently matches the full error text including
the specific bad value "'Andrew'", which is brittle; update the expectations in
string_query.slt.part for the two failing statements (the ones starting with
"select ascii_1 = 1 as col1, 1 = ascii_1 as col2 from test_basic_operator;" and
"select ascii_1 <> 1 as col1, 1 <> ascii_1 as col2 from test_basic_operator;")
to use a looser regex or substring that asserts the cast-failure class/message
(e.g. match "Arrow error: Cast error" and/or "Cannot cast .* to value of Int64"
without the concrete literal) so the test no longer depends on the exact value
text.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f2fe6eb-5b52-4385-84b7-15352b532733

📥 Commits

Reviewing files that changed from the base of the PR and between cebfbf1 and b73e41a.

📒 Files selected for processing (22)

datafusion/core/src/physical_planner.rs
datafusion/core/tests/expr_api/mod.rs
datafusion/core/tests/sql/unparser.rs
datafusion/expr-common/src/interval_arithmetic.rs
datafusion/expr-common/src/signature.rs
datafusion/expr-common/src/type_coercion/binary.rs
datafusion/expr-common/src/type_coercion/binary/tests/comparison.rs
datafusion/expr-common/src/type_coercion/binary/tests/dictionary.rs
datafusion/expr-common/src/type_coercion/binary/tests/run_end_encoded.rs
datafusion/expr/src/type_coercion/functions.rs
datafusion/expr/src/type_coercion/other.rs
datafusion/functions/src/core/nvl2.rs
datafusion/optimizer/src/analyzer/type_coercion.rs
datafusion/physical-expr/src/expressions/case.rs
datafusion/sqllogictest/test_files/delete.slt
datafusion/sqllogictest/test_files/dictionary.slt
datafusion/sqllogictest/test_files/expr.slt
datafusion/sqllogictest/test_files/push_down_filter_parquet.slt
datafusion/sqllogictest/test_files/string/string_query.slt.part
datafusion/sqllogictest/test_files/string_numeric_coercion.slt
datafusion/substrait/tests/cases/roundtrip_logical_plan.rs
docs/source/library-user-guide/upgrading/53.0.0.md

💤 Files with no reviewable changes (1)

datafusion/sqllogictest/test_files/push_down_filter_parquet.slt

coderabbitai · 2026-03-25T21:24:14Z

datafusion/expr-common/src/type_coercion/binary.rs

                .or_else(|| temporal_coercion_nonstrict_timezone(lhs_type, rhs_type))
                .or_else(|| string_coercion(lhs_type, rhs_type))
-                .or_else(|| numeric_string_coercion(lhs_type, rhs_type))
+                .or_else(|| string_numeric_coercion(lhs_type, rhs_type))


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the actual usage of type_union_resolution vs type_union_coercion # to determine if the inconsistency affects real scenarios echo "=== Usages of type_union_resolution ===" rg -n "type_union_resolution\b" --type rust -C 2 echo "" echo "=== Usages of type_union_resolution_coercion ===" rg -n "type_union_resolution_coercion\b" --type rust -C 2

Repository: martin-augment/datafusion

Length of output: 13558

🏁 Script executed:

cat -n datafusion/expr-common/src/type_coercion/binary.rs | sed -n '650,750p'

Repository: martin-augment/datafusion

Length of output: 4653

🏁 Script executed:

cat -n datafusion/expr-common/src/type_coercion/binary.rs | sed -n '860,900p'

Repository: martin-augment/datafusion

Length of output: 2171

🏁 Script executed:

rg -n "fn string_numeric_coercion|fn string_numeric_union_coercion" --type rust -A 10 datafusion/expr-common/src/type_coercion/binary.rs

Repository: martin-augment/datafusion

Length of output: 985

🏁 Script executed:

cat -n datafusion/expr-common/src/type_coercion/binary.rs | sed -n '590,610p'

Repository: martin-augment/datafusion

Length of output: 1075

🏁 Script executed:

cat -n datafusion/expr-common/src/type_coercion/binary.rs | sed -n '850,875p'

Repository: martin-augment/datafusion

Length of output: 1811

Replace string_numeric_coercion with string_numeric_union_coercion at line 750.

type_union_resolution_coercion is used by type_union_resolution, which handles multi-type unification in UNION, CASE, COALESCE, MAKE_ARRAY, CONCAT, and similar operations. However, it uses string_numeric_coercion (which prefers numeric types) instead of string_numeric_union_coercion (which prefers string types).

This creates an inconsistency: the two-type unification function type_union_coercion (line 868) explicitly documents and implements string preference for numeric/string pairs, but the multi-type version does the opposite. This causes operations like UNION to produce different results depending on the number of branches, and violates the stated design principle that type unification should prefer strings because every number can be represented as text.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@datafusion/expr-common/src/type_coercion/binary.rs` at line 750, The call to string_numeric_coercion in type_union_resolution_coercion should be replaced with string_numeric_union_coercion so multi-type union resolution prefers string types for numeric/string pairs; update the expression chain that currently calls string_numeric_coercion (the .or_else(...) branch) to call string_numeric_union_coercion instead, ensuring behavior matches type_union_coercion's documented string-preference policy.

coderabbitai · 2026-03-25T21:24:14Z

datafusion/optimizer/src/analyzer/type_coercion.rs

+use datafusion_expr::type_coercion::binary::{
+    comparison_coercion, like_coercion, type_union_coercion,
+};
 use datafusion_expr::type_coercion::functions::{UDFCoercionExt, fields_with_udf};
 use datafusion_expr::type_coercion::other::{
-    get_coerce_type_for_case_expression, get_coerce_type_for_list,
+    get_coerce_type_for_case_expression, get_coerce_type_for_case_when,
+    get_coerce_type_for_list,


⚠️ Potential issue | 🟠 Major

Carry the new comparison semantics through SetComparison.

This file now adopts the comparison-vs-union split, but the Expr::SetComparison branch still has the old numeric/string early return at Lines 521-527. = ANY / < ALL will therefore keep erroring on pairs that plain comparisons, IN, and BETWEEN now coerce successfully. Please drop that guard and add a regression test for a mixed numeric/string subquery comparison.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@datafusion/optimizer/src/analyzer/type_coercion.rs` around lines 44 - 50, The SetComparison branch (Expr::SetComparison) still applies the old numeric/string early-return guard and must instead use the new comparison-vs-union semantics: remove the numeric/string early-return in the Expr::SetComparison handling and invoke the same comparison_coercion (or type_union_coercion where appropriate) logic used by plain comparisons/IN/BETWEEN so = ANY / < ALL follow the new coercion paths; update the branch to call the comparison coercion helpers (e.g., comparison_coercion or type_union_coercion) and handle UDF/field coercions consistently (UDFCoercionExt, fields_with_udf) and add a regression test that constructs a mixed numeric/string subquery comparison (e.g., value = ANY(subquery with mixed numeric and string types) ) verifying it now coerces instead of erroring.

neilconway and others added 12 commits March 5, 2026 16:17

Revised approach to fixing numeric/string type coercion

ba74cd2

Revert map key coercion regression, fix other tests

4bf9fe4

cargo fmt

1cd2602

More rationale for type union behavior

e2272f0

Fix comment typos

e52f506

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Apply suggestion from @martin-g

8084010

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

Per review, explicitly test for IN list type mismatch error

7667bc1

Add upgrade notes for change

43657dd

Fix docs prettier lint

36cd3df

Fix think in docs

e12bad2

Doc tweaks

6bdfc37

Merge origin/main into neilc/type-coercion-str-numeric

b73e41a

augmentcode bot reviewed Mar 25, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

cursor bot reviewed Mar 25, 2026

View reviewed changes

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

Conversation

martin-augment commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Mar 25, 2026

PR Review: fix: Prefer numeric in type coercion for comparisons

What the PR Does

Correctness / Potential Issues

Code Quality

Test Coverage

Performance

Documentation

Summary

Uh oh!

augmentcode bot commented Mar 25, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 25, 2026

Choose a reason for hiding this comment

List coercion not parameterized unlike struct and map

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

PR Review: `fix: Prefer numeric in type coercion for comparisons`