Merged
Conversation
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? - Closes #15161. ## Rationale for this change In a comparison between a numeric column and a string literal (e.g., `WHERE int_col < '10'`), we previously coerced the numeric column to be a string type. This resulted in doing a lexicographic comparison, which results in incorrect query results. Instead, we split type coercion into two situations: type coercion for comparisons (including `IN` lists, `BETWEEN`, and `CASE WHEN`), where we want string->numeric coercion, and type coercion for places like `UNION` or `CASE ... THEN/ELSE`, where DataFusion's traditional behavior has been to tolerate type mismatching by coercing values to strings. Here is a (not necessarily exhaustive) summary of the behavioral changes (old -> new): ``` Comparisons (=, <, >, etc.): float_col = '5' : string (wrong: '5'!='5.0') -> numeric int_col > '100' : string (wrong: '325'<'100') -> numeric int_col = 'hello' : string, always false -> cast error int_col = '' : string, always false -> cast error int_col = '99.99' : string, always false -> cast error Dict(Int) = '5' : string -> numeric REE(Int) = '5' : string -> numeric struct(int)=struct(str): int field to Utf8 -> str field to int IN lists: float_col IN ('1.0') : string (wrong: '1.0'!='1') -> numeric str_col IN ('a', 1) : coerce to Utf8 -> coerce to Int64 CASE: CASE str WHEN float : coerce to Utf8 -> coerce to Float LIKE / regex: Dict(Int) LIKE '%5%' : coerce to Utf8 -> error (matches int) REE(Int) LIKE '%5%' : coerce to Utf8 -> error (matches int) Dict(Int) ~ '5' : coerce to Utf8 -> error (matches int) REE(Int) ~ '5' : error (no REE) -> error (REE added) REE(Utf8) ~ '5' : error (no REE) -> works (REE added) ``` ## What changes are included in this PR? * Update `comparison_coercion` to coerce strings to numerics * Remove previous `comparison_coercion_numeric` function * Add a new function, `type_union_coercion`, and use it when appropriate * Add support for REE types with regexp operators (this was unsupported for no good reason I can see) * Add unit and SLT tests for new coercion behavior * Update existing SLT tests for changes in coercion behavior * Fix the ClickBench unparser tests to avoid comparing int fields with non-numeric string literals ## Are these changes tested? Yes. New tests added, existing tests pass. ## Are there any user-facing changes? Yes, see table above. In most cases the new behavior should be more sensible and less error-prone, but it will likely break some user code. --------- Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## What changes are included in this PR? Adds support for round spark function in data fusion. ## Are these changes tested? yes, using UTs ## Are there any user-facing changes? yes, adds new function. --------- Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local> Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close? Follow up of #20768. ## Rationale for this change `Precision::min/max` allocates a lot of new `ScalarValues`, and it can be done in place. While running the `sql_planner` benchmark, it seems like for clickbench `Statistics::try_merge_iter` is a significant part of the runtime, and this PR improves that part by about 20-25% locally. ## What changes are included in this PR? Introduces a couple of of new internal functions to calculate the min/max of a `Precision` in-place. ## Are these changes tested? Existing general tests, and a few new unit tests. ## Are there any user-facing changes? None --------- Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
…spill files (#21293) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #21292. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `InProgressSpillFile` will use the first batch schema when it has no writer, which could have different nullability from the subsequent batches (e.g. due to union-ing a literal projection and a table column projection). This can then lead to a panic in `sort_batch`. `InProgressSpillFile` already has access to the canonical schema that the spill file should have (`self.spill_writer.schema()`). ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> Single line fix: instead of using the first batch schema for the spill file schema, use the spill_writer's schema instead. The rest of the changes are two new tests. ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes, two new tests are added. ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? - part of #20529 - Broken out of #20820 ## Rationale for this change @Dandandan, @adriangb and I are in the process of significantly reworking how FileStream works internally (morsels!) As the FileStream will get more complicated, it needs a better structure than one giant module to ensure we can work with it. I am trying to do this as incremental PRs to keep review burden low ## What changes are included in this PR? 1. Move FileStreamMetrics to its own module ## Are these changes tested? Yes by CI ## Are there any user-facing changes? No, just internal code motion
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )