[pull] main from apache:main by pull[bot] · Pull Request #81 · buraksenn/datafusion

pull · 2026-04-04T00:33:24Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

## Which issue does this PR close?  - Closes #. ## Rationale for this change  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?

## Which issue does this PR close? - Closes #15161. ## Rationale for this change In a comparison between a numeric column and a string literal (e.g., `WHERE int_col < '10'`), we previously coerced the numeric column to be a string type. This resulted in doing a lexicographic comparison, which results in incorrect query results. Instead, we split type coercion into two situations: type coercion for comparisons (including `IN` lists, `BETWEEN`, and `CASE WHEN`), where we want string->numeric coercion, and type coercion for places like `UNION` or `CASE ... THEN/ELSE`, where DataFusion's traditional behavior has been to tolerate type mismatching by coercing values to strings. Here is a (not necessarily exhaustive) summary of the behavioral changes (old -> new): ``` Comparisons (=, <, >, etc.): float_col = '5' : string (wrong: '5'!='5.0') -> numeric int_col > '100' : string (wrong: '325'<'100') -> numeric int_col = 'hello' : string, always false -> cast error int_col = '' : string, always false -> cast error int_col = '99.99' : string, always false -> cast error Dict(Int) = '5' : string -> numeric REE(Int) = '5' : string -> numeric struct(int)=struct(str): int field to Utf8 -> str field to int IN lists: float_col IN ('1.0') : string (wrong: '1.0'!='1') -> numeric str_col IN ('a', 1) : coerce to Utf8 -> coerce to Int64 CASE: CASE str WHEN float : coerce to Utf8 -> coerce to Float LIKE / regex: Dict(Int) LIKE '%5%' : coerce to Utf8 -> error (matches int) REE(Int) LIKE '%5%' : coerce to Utf8 -> error (matches int) Dict(Int) ~ '5' : coerce to Utf8 -> error (matches int) REE(Int) ~ '5' : error (no REE) -> error (REE added) REE(Utf8) ~ '5' : error (no REE) -> works (REE added) ``` ## What changes are included in this PR? * Update `comparison_coercion` to coerce strings to numerics * Remove previous `comparison_coercion_numeric` function * Add a new function, `type_union_coercion`, and use it when appropriate * Add support for REE types with regexp operators (this was unsupported for no good reason I can see) * Add unit and SLT tests for new coercion behavior * Update existing SLT tests for changes in coercion behavior * Fix the ClickBench unparser tests to avoid comparing int fields with non-numeric string literals ## Are these changes tested? Yes. New tests added, existing tests pass. ## Are there any user-facing changes? Yes, see table above. In most cases the new behavior should be more sensible and less error-prone, but it will likely break some user code. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>

) ## Which issue does this PR close?  - Closes #. ## Rationale for this change  ## What changes are included in this PR?  ## Are these changes tested?  ## Are there any user-facing changes?   --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

## What changes are included in this PR? Adds support for round spark function in data fusion. ## Are these changes tested? yes, using UTs ## Are there any user-facing changes? yes, adds new function. --------- Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local> Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

## Which issue does this PR close? Follow up of #20768. ## Rationale for this change `Precision::min/max` allocates a lot of new `ScalarValues`, and it can be done in place. While running the `sql_planner` benchmark, it seems like for clickbench `Statistics::try_merge_iter` is a significant part of the runtime, and this PR improves that part by about 20-25% locally. ## What changes are included in this PR? Introduces a couple of of new internal functions to calculate the min/max of a `Precision` in-place. ## Are these changes tested? Existing general tests, and a few new unit tests. ## Are there any user-facing changes? None --------- Signed-off-by: Adam Gutglick <adamgsal@gmail.com>

…spill files (#21293) ## Which issue does this PR close?  - Closes #21292. ## Rationale for this change  `InProgressSpillFile` will use the first batch schema when it has no writer, which could have different nullability from the subsequent batches (e.g. due to union-ing a literal projection and a table column projection). This can then lead to a panic in `sort_batch`. `InProgressSpillFile` already has access to the canonical schema that the spill file should have (`self.spill_writer.schema()`). ## What changes are included in this PR?  Single line fix: instead of using the first batch schema for the spill file schema, use the spill_writer's schema instead. The rest of the changes are two new tests. ## Are these changes tested?  Yes, two new tests are added. ## Are there any user-facing changes?

@Dandandan

## Which issue does this PR close? - part of #20529 - Broken out of #20820 ## Rationale for this change @Dandandan, @adriangb and I are in the process of significantly reworking how FileStream works internally (morsels!) As the FileStream will get more complicated, it needs a better structure than one giant module to ensure we can work with it. I am trying to do this as incremental PRs to keep review burden low ## What changes are included in this PR? 1. Move FileStreamMetrics to its own module ## Are these changes tested? Yes by CI ## Are there any user-facing changes? No, just internal code motion

comphead and others added 7 commits April 3, 2026 19:42

pull bot locked and limited conversation to collaborators Apr 4, 2026

pull bot added the ⤵️ pull label Apr 4, 2026

pull bot merged commit e1c5453 into buraksenn:main Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from apache:main#81

[pull] main from apache:main#81
pull[bot] merged 7 commits intoburaksenn:mainfrom
apache:main

pull bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

pull bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pull bot commented Apr 4, 2026 •

edited

Loading