Skip to content

[pull] main from apache:main#91

Merged
pull[bot] merged 4 commits intoburaksenn:mainfrom
apache:main
Apr 9, 2026
Merged

[pull] main from apache:main#91
pull[bot] merged 4 commits intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 9, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…ield-aware CastExpr (#20836)

## Which issue does this PR close?

* Part of #20164

---

## Rationale for this change

The current physical planning path for `Expr::Cast` discards logical
field information (name, nullability, and metadata) by lowering casts
using only the target `DataType`. This results in a loss of semantic
fidelity between logical and physical plans, particularly for
metadata-bearing fields and same-type casts with explicit field intent.

Additionally, the planner previously rejected casts with metadata due to
limitations of the type-only casting API, creating inconsistencies with
other parts of the system (e.g. adapter-generated expressions).

This change introduces a field-aware casting path that preserves logical
intent throughout physical lowering, ensuring consistent semantics
across planner and adapter outputs.

---

## What changes are included in this PR?

* Introduced `cast_with_target_field` to construct `CastExpr` using full
`FieldRef` semantics (name, nullability, metadata).
* Refactored existing `cast_with_options` to delegate to the new
field-aware helper.
* Moved `is_default_target_field` to a shared helper function for reuse.
* Updated planner (`planner.rs`) to use `cast_with_target_field` instead
of type-only casting.
* Removed metadata rejection logic during cast lowering.
* Ensured same-type casts preserve explicit field semantics unless the
target field is default.
* Adjusted cast construction to validate compatibility before building
expressions.
* Exported `cast_with_target_field` for internal planner use.

---

## Are these changes tested?

Yes.

Added planner-focused unit tests to validate:

* Preservation of target field metadata during cast lowering
* Correct propagation of nullability semantics
* Proper handling of same-type casts with explicit field overrides
* No regression for standard type-only casts
* Rejection behavior for unsupported extension type casts via `TryCast`

These tests ensure both backward compatibility and correctness of the
new semantics.

---

## Are there any user-facing changes?

Yes, behaviorally (but not API-breaking):

* Cast expressions now preserve logical field metadata and nullability
in physical plans.
* Previously rejected metadata-bearing casts are now supported.
* Same-type casts may now produce a `CastExpr` when explicit field
semantics are provided.

There are no breaking changes to public APIs, but downstream consumers
that relied on previous planner behavior (e.g. metadata stripping or
cast elision) may observe differences.

---

## LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated
content has been manually reviewed and tested.
@pull pull bot locked and limited conversation to collaborators Apr 9, 2026
@pull pull bot added the ⤵️ pull label Apr 9, 2026
…erns. 2.4x improvement (ClickBench Q28) (#21379)

## Which issue does this PR close?

- Closes: #21382

## Rationale for this change


`regexp_replace` with anchored patterns like
`^https?://(?:www\.)?([^/]+)/.*$` spends time scanning the trailing
`.*$` and using `captures()` + `expand()` with `String` allocation on
every row.

It just happens this `SELECT regexp_replace(url,
'^https?://(?:www\.)?([^/]+)/.*$', '\1')` query benefits from this
optimization (2.4x faster)

## What changes are included in this PR?


- Strip trailing `.*$` from the pattern string for anchored patterns
where the replacement is `\1`
- Use `captures_read` with pre-allocated `CaptureLocations` for direct
byte-slice extraction

## Are these changes tested?


Yes, covered by existing `regexp_replace` unit tests, ClickBench
sqllogictests, and the new URL domain extraction sqllogictest.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…21052)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #20818.
- Alternative to #20819.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

DataFusion requires all projected expressions to have unique names
during planning, so it doesn't support select 0, 0 for instance.

However this shouldn't be an issue when this is just a sub-SELECT in a
larger query which does abide by this rule. For example a set expression
(UNION, EXCEPT, INTERSECT) query should only require the first SELECT to
provide a unique schema, and that should be sufficient.

Furthermore, this requirement is even more redundant, since all field
name/aliases other than those in the first SELECT are discarded anyway.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- when we're processing a set expression (UNION, EXCEPT, INTERSECT),
save the left side schema to planner context
- when we're inside `SqlToRel::select_to_plan` pop the schema and pass
it down to
- a new `project_with_validation_and_schema` function in
`LogicalPlanBuilder` to properly alias them

The benefit of this approach compared to #20819 is that wildcards are
unwrapped and we can properly handle them as well.

The downside is that we need to thread the left schema via the planner
context now.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, there are unit tests and SLTs.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

New method in `LogicalPlanBuilder` called
`project_with_validation_and_schema` which will alias the projection
with the provided schema.
## Which issue does this PR close?

- part of #7013 

## Rationale for this change

Keep the news fresh and up to date

## What changes are included in this PR?
Update links to interesting DataFusion content

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 9, 2026
@pull pull bot merged commit 02e4411 into buraksenn:main Apr 9, 2026
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants