Merged
Conversation
…ield-aware CastExpr (#20836) ## Which issue does this PR close? * Part of #20164 --- ## Rationale for this change The current physical planning path for `Expr::Cast` discards logical field information (name, nullability, and metadata) by lowering casts using only the target `DataType`. This results in a loss of semantic fidelity between logical and physical plans, particularly for metadata-bearing fields and same-type casts with explicit field intent. Additionally, the planner previously rejected casts with metadata due to limitations of the type-only casting API, creating inconsistencies with other parts of the system (e.g. adapter-generated expressions). This change introduces a field-aware casting path that preserves logical intent throughout physical lowering, ensuring consistent semantics across planner and adapter outputs. --- ## What changes are included in this PR? * Introduced `cast_with_target_field` to construct `CastExpr` using full `FieldRef` semantics (name, nullability, metadata). * Refactored existing `cast_with_options` to delegate to the new field-aware helper. * Moved `is_default_target_field` to a shared helper function for reuse. * Updated planner (`planner.rs`) to use `cast_with_target_field` instead of type-only casting. * Removed metadata rejection logic during cast lowering. * Ensured same-type casts preserve explicit field semantics unless the target field is default. * Adjusted cast construction to validate compatibility before building expressions. * Exported `cast_with_target_field` for internal planner use. --- ## Are these changes tested? Yes. Added planner-focused unit tests to validate: * Preservation of target field metadata during cast lowering * Correct propagation of nullability semantics * Proper handling of same-type casts with explicit field overrides * No regression for standard type-only casts * Rejection behavior for unsupported extension type casts via `TryCast` These tests ensure both backward compatibility and correctness of the new semantics. --- ## Are there any user-facing changes? Yes, behaviorally (but not API-breaking): * Cast expressions now preserve logical field metadata and nullability in physical plans. * Previously rejected metadata-bearing casts are now supported. * Same-type casts may now produce a `CastExpr` when explicit field semantics are provided. There are no breaking changes to public APIs, but downstream consumers that relied on previous planner behavior (e.g. metadata stripping or cast elision) may observe differences. --- ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.
…erns. 2.4x improvement (ClickBench Q28) (#21379) ## Which issue does this PR close? - Closes: #21382 ## Rationale for this change `regexp_replace` with anchored patterns like `^https?://(?:www\.)?([^/]+)/.*$` spends time scanning the trailing `.*$` and using `captures()` + `expand()` with `String` allocation on every row. It just happens this `SELECT regexp_replace(url, '^https?://(?:www\.)?([^/]+)/.*$', '\1')` query benefits from this optimization (2.4x faster) ## What changes are included in this PR? - Strip trailing `.*$` from the pattern string for anchored patterns where the replacement is `\1` - Use `captures_read` with pre-allocated `CaptureLocations` for direct byte-slice extraction ## Are these changes tested? Yes, covered by existing `regexp_replace` unit tests, ClickBench sqllogictests, and the new URL domain extraction sqllogictest. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> No. --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…21052) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #20818. - Alternative to #20819. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> DataFusion requires all projected expressions to have unique names during planning, so it doesn't support select 0, 0 for instance. However this shouldn't be an issue when this is just a sub-SELECT in a larger query which does abide by this rule. For example a set expression (UNION, EXCEPT, INTERSECT) query should only require the first SELECT to provide a unique schema, and that should be sufficient. Furthermore, this requirement is even more redundant, since all field name/aliases other than those in the first SELECT are discarded anyway. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - when we're processing a set expression (UNION, EXCEPT, INTERSECT), save the left side schema to planner context - when we're inside `SqlToRel::select_to_plan` pop the schema and pass it down to - a new `project_with_validation_and_schema` function in `LogicalPlanBuilder` to properly alias them The benefit of this approach compared to #20819 is that wildcards are unwrapped and we can properly handle them as well. The downside is that we need to thread the left schema via the planner context now. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, there are unit tests and SLTs. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> New method in `LogicalPlanBuilder` called `project_with_validation_and_schema` which will alias the projection with the provided schema.
## Which issue does this PR close? - part of #7013 ## Rationale for this change Keep the news fresh and up to date ## What changes are included in this PR? Update links to interesting DataFusion content ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )