Upgrade to Scala 2.13.18 and modernize unused warnings configuration#5
Merged
res-life merged 16 commits intores-life:spark-41-shimfrom Jan 28, 2026
Merged
Upgrade to Scala 2.13.18 and modernize unused warnings configuration#5res-life merged 16 commits intores-life:spark-41-shimfrom
res-life merged 16 commits intores-life:spark-41-shimfrom
Conversation
### Description Update authorized users ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Sameer Raheja <sraheja@.nvidia.com> Co-authored-by: Sameer Raheja <sraheja@.nvidia.com>
…e to safe (NVIDIA#14166) Contributes to NVIDIA#14135 ### Description From Spark 41x, it changes the default mode from unsafe to safe, so some cases failed. This PR set disable `spark.sql.execution.pandas.convertToArrowArraySafely` to pass the ITs. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
Contributes to NVIDIA#13672 ### Description This PR: Add retry support to GpuBatchedBoundedWindowIterator to handle OOM: - Protect the following 3 operations with OOM retry support. - Window computation (by `computeWindowWithRetry`) - Input batch concatenation with the cache (by `getNextInputBatchWithRetry`) - Batch trim (by `trimWithRetry`) - Add unit tests for retry and split-and-retry OOM scenarios NDS numbers (:Seconds) with 10k data size shows no perf regressions. 'rapids-4-spark_2.12-26.02.0-20260102.073925-32-cuda12.jar' was used for nightly runs, not sure why it is a little slower. Will try to run this more. |ID|with PR| Nitghtly| |--|--|--| |1| 1302 | 1369| |2| 1315 | 1389| |avg| 1308.5|1379| ### Checklists - [x] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [x] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com> Co-authored-by: Firestarman <firestarmanllc@gmail.com>
Fixes NVIDIA#14179 ## Problem When a Spark task is killed, the merger thread can remain stuck in `Object.wait()` indefinitely, becoming a "zombie" that blocks subsequent tasks assigned to the same merger slot. **Root cause**: The wait loops in `mergerTask` only check `hasNewWork` flag but don't properly respond to thread interruption. When `cancel(true)` is called, there's a race condition where the interrupt may not be handled. **Impact**: Observed 26+ minute executor hang in production NDS benchmark. ## Fix Add interrupt flag checking in wait loops and catch `InterruptedException`: ```scala while (!hasNewWork.get() && !Thread.currentThread().isInterrupted) { try { mergerCondition.wait() } catch { case _: InterruptedException => Thread.currentThread().interrupt() return } } if (Thread.currentThread().isInterrupted) { return } ``` This ensures the merger thread exits gracefully when cancelled, even in edge cases. --------- Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
…DIA#14182) Follow‑up on the migration to remove anonymous access from Artifactory. Please migrate the pre‑merge scripts to access the new Maven Artifactory using a user and token. The test script spark-premerge-build.sh → [hybrid-execution.sh](https://github.com/NVIDIA/spark-rapids/blob/main/jenkins/hybrid_execution.sh#L41-L43) downloads JARs from the Artifactory repository via `wget`. Provide the .netrc credentials for downloading JARs from the Artifactory repository. Note: The pre‑merge CI already covers tests for this change. --------- Signed-off-by: Tim Liu <timl@nvidia.com>
) Fixes NVIDIA#7520 ### Description calls to JNI utility to do overflow check for round/bround operators Added cases for overflow check for byte/short/int/long types. Note: Only Spark 340+ supports ANSI for round/bround. ### depends on * NVIDIA/spark-rapids-jni#4174 ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Fixes NVIDIA#13389. This commit adds support for [Iceberg's "identity" partition transform](https://iceberg.apache.org/spec/#partition-transforms). This allows for an Iceberg table to be partitioned on a column's values, with no modification to the column's row values. The implementation is trivial. In the interest of not increasing the test runtime too much, a sampling of column types have been included in the coverage tests. --------- Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Another test case to use RelationalGroupedDataset.toString to expect the correct type string. But in JDK11+ it returns empty string. Created testRapids cases by checking JDK version to change the expected string. Close NVIDIA#14188 --------- Signed-off-by: Gary Shen <gashen@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…4189) Fixes NVIDIA#14037. ### Description Cherry-pick code from Spark. BroadcastExchangeExec after materialized won't be materialized again, so we should not reset the metrics. Cherry-pick to `GpuBroadcastExchangeExecBase`. The apache/spark@a823f95c522 targets Spark 4.1, because it's a common fix, we do not only add this to 411 shim, the change applies to all Spark versions. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [ ] This PR has added new tests or modified existing tests to cover new code paths. It's only related to metrics, so do not impact any feature. - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. Signed-off-by: Chong Gao <res_life@163.com> Co-authored-by: Chong Gao <res_life@163.com>
) close NVIDIA#14099 According to the debugging logs as below, the root cause is `GpuProjectExec` was trying to produce a string column(“payload#3”) with too large size (~12.8G), even GPU will split it into 5 parts by the pre-split in `GpuProjectExec`, each still has ~2.56G(=12.8/5), larger than the size limit(2G) of cudf column that requires an offset buffer. The pre-split computation only takes care of the total output size, ignoring the individual column size limit. ``` ===> got 1073741760 bytes for out column of expr: input[0, bigint, true](join_key#2) ===> got 13743894532 bytes for out column of expr: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AS payload#3 ==> got 5 splits for output size: 14817636292, split unit size: 3.221225472E9 ``` So this PR improves the pre-split to take the column limit into account when calculating the splits number. Verfied this PR by the case in the linked issue locally and it can fix the "CUDF String column overflow". NOTE, this PR can only fix the literal case mentioned in the linked issue. To also fix non-literal cases for GpuProjectExec, we need to address the issue NVIDIA#14191. --------- Signed-off-by: Firestarman <firestarmanllc@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com>
Fixes NVIDIA#14196 ### Description We already supported identity transform in iceberg, so we should remove all fallback tests for identity transform. This pr continues NVIDIA#14183 to cleanup those tests. ### Checklists - [ ] This PR has added documentation for new or modified features or behaviors. - [x] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.) - [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description. --------- Signed-off-by: Ubuntu <ubuntu@ip-172-31-50-247.us-west-2.compute.internal> Co-authored-by: Ubuntu <ubuntu@ip-172-31-50-247.us-west-2.compute.internal>
This commit includes the following changes: - Upgrade Scala 2.13 from version 2.13.14 to 2.13.18 - Modernize compiler warning flags by replacing the deprecated -Ywarn-unused:locals,patvars,privates with more granular -Wconf and -Wunused syntax for better control over unused code detection - Remove unused imports across Delta Lake and SQL plugin files identified by stricter compiler settings - Simplify Scala 2.13 build profile handling in buildall script by consolidating POM file selection and removing redundant profile-specific version collection logic - Update documentation references from "unshimmed-common-from-spark320.txt" to "unshimmed-common-from-single-shim.txt" to reflect generalized shim naming - Add --scala213 command-line option to buildall for explicit Scala 2.13 builds Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
6 tasks
res-life
approved these changes
Jan 28, 2026
Owner
|
Only has |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
with more granular -Wconf and -Wunused syntax for better control over unused code detection
and removing redundant profile-specific version collection logic
"unshimmed-common-from-single-shim.txt" to reflect generalized shim naming
Signed-off-by: Gera Shegalov gshegalov@nvidia.com