[GLUTEN-7600][VL] Add monotonically_increasing_id function mapping#11674
Open
n0r0shi wants to merge 1 commit intoapache:mainfrom
Open
[GLUTEN-7600][VL] Add monotonically_increasing_id function mapping#11674n0r0shi wants to merge 1 commit intoapache:mainfrom
n0r0shi wants to merge 1 commit intoapache:mainfrom
Conversation
Adds `Sig[MonotonicallyIncreasingID]` to `ExpressionMappings.SCALAR_SIGS` so the function is offloaded to Velox instead of falling back to vanilla Spark. Also sets Velox's `expression.dedup_non_deterministic` to `false`. By default Velox deduplicates structurally identical non-deterministic expression trees, merging them into a single instance with shared state. This is incorrect for Spark semantics where each non-deterministic call has independent state — e.g. `SELECT monotonically_increasing_id(), monotonically_increasing_id()` must return [0,0],[1,1] (two independent counters), not [0,2],[1,3] (one shared counter). For seeded functions like `rand(42)`, disabling dedup is safe: each independent instance produces the same sequence from the same seed, matching Spark's behavior either way. Un-ignores and fixes the corresponding test in `ScalarFunctionsValidateSuite`. Closes apache#7628
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sig[MonotonicallyIncreasingID]toExpressionMappings.SCALAR_SIGSso the function is offloaded to Velox instead of falling back to vanilla Spark.expression.dedup_non_deterministictofalseto match Spark semantics — Spark never deduplicates non-deterministic expressions, each call has independent state.ScalarFunctionsValidateSuite.Context
PR #10097 previously attempted this but was closed because of a result mismatch (#7628):
SELECT monotonically_increasing_id(), monotonically_increasing_id()returnedinstead of Spark's expected
The root cause was Velox's expression compiler deduplicating the two structurally identical calls into one shared counter instance.
Velox has since added the
expression.dedup_non_deterministicconfig (facebookincubator/velox#15008) to control this behavior. This PR sets it tofalsefor Gluten. This only affects non-deterministic expressions — deterministic expression deduplication is unchanged.Question for reviewers: Is setting
expression.dedup_non_deterministic = falseglobally the right approach? An alternative would be conditionally disabling it only when stateful expressions likemonotonically_increasing_idare detected in the plan, but we believe the global approach is correct since Spark semantics never deduplicate non-deterministic expressions.Closes #7628
Related issue: #7600