Skip to content

Upgrade to Scala 2.13.18 and modernize unused warnings configuration#5

Merged
res-life merged 16 commits intores-life:spark-41-shimfrom
gerashegalov:gera-jdk8
Jan 28, 2026
Merged

Upgrade to Scala 2.13.18 and modernize unused warnings configuration#5
res-life merged 16 commits intores-life:spark-41-shimfrom
gerashegalov:gera-jdk8

Conversation

@gerashegalov
Copy link

@gerashegalov gerashegalov commented Jan 24, 2026

  • Upgrade Scala 2.13 from version 2.13.14 to 2.13.18
  • Modernize compiler warning flags by replacing the deprecated -Ywarn-unused:locals,patvars,privates
    with more granular -Wconf and -Wunused syntax for better control over unused code detection
  • Remove unused imports across Delta Lake and SQL plugin files identified by stricter compiler settings
  • Simplify Scala 2.13 build profile handling in buildall script by consolidating POM file selection
    and removing redundant profile-specific version collection logic
  • Update documentation references from "unshimmed-common-from-spark320.txt" to
    "unshimmed-common-from-single-shim.txt" to reflect generalized shim naming
  • Add --scala213 command-line option to buildall for explicit Scala 2.13 builds

Signed-off-by: Gera Shegalov gshegalov@nvidia.com

sameerz and others added 16 commits January 20, 2026 08:36
### Description
Update authorized users 

### Checklists

- [ ] This PR has added documentation for new or modified features or
behaviors.
- [ ] This PR has added new tests or modified existing tests to cover
new code paths.
(Please explain in the PR description how the new code paths are tested,
such as names of the new/existing tests that cover them.)
- [ ] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

Signed-off-by: Sameer Raheja <sraheja@.nvidia.com>
Co-authored-by: Sameer Raheja <sraheja@.nvidia.com>
…e to safe (NVIDIA#14166)

Contributes to NVIDIA#14135

### Description
From Spark 41x, it changes the default mode from unsafe to safe, so some
cases failed.
This PR set disable
`spark.sql.execution.pandas.convertToArrowArraySafely` to pass the ITs.

### Checklists

- [ ] This PR has added documentation for new or modified features or
behaviors.
- [ ] This PR has added new tests or modified existing tests to cover
new code paths.
(Please explain in the PR description how the new code paths are tested,
such as names of the new/existing tests that cover them.)
- [ ] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>
Contributes to NVIDIA#13672

### Description

This PR:
Add retry support to GpuBatchedBoundedWindowIterator to handle OOM:
- Protect the following 3 operations with OOM retry support.
  - Window computation (by `computeWindowWithRetry`)
- Input batch concatenation with the cache (by
`getNextInputBatchWithRetry`)
  - Batch trim (by `trimWithRetry`)
- Add unit tests for retry and split-and-retry OOM scenarios

NDS numbers (:Seconds) with 10k data size shows no perf regressions. 

'rapids-4-spark_2.12-26.02.0-20260102.073925-32-cuda12.jar' was used for
nightly runs, not sure why it is a little slower. Will try to run this
more.

|ID|with PR| Nitghtly|
|--|--|--| 
|1| 1302 | 1369|
|2| 1315 | 1389|
|avg| 1308.5|1379|


### Checklists

- [x] This PR has added documentation for new or modified features or
behaviors.
- [x] This PR has added new tests or modified existing tests to cover
new code paths.
(Please explain in the PR description how the new code paths are tested,
such as names of the new/existing tests that cover them.)
- [x] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Co-authored-by: Firestarman <firestarmanllc@gmail.com>
Fixes NVIDIA#14179

## Problem

When a Spark task is killed, the merger thread can remain stuck in
`Object.wait()` indefinitely, becoming a "zombie" that blocks subsequent
tasks assigned to the same merger slot.

**Root cause**: The wait loops in `mergerTask` only check `hasNewWork`
flag but don't properly respond to thread interruption. When
`cancel(true)` is called, there's a race condition where the interrupt
may not be handled.

**Impact**: Observed 26+ minute executor hang in production NDS
benchmark.

## Fix

Add interrupt flag checking in wait loops and catch
`InterruptedException`:

```scala
while (!hasNewWork.get() && !Thread.currentThread().isInterrupted) {
  try {
    mergerCondition.wait()
  } catch {
    case _: InterruptedException =>
      Thread.currentThread().interrupt()
      return
  }
}
if (Thread.currentThread().isInterrupted) {
  return
}
```

This ensures the merger thread exits gracefully when cancelled, even in
edge cases.

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
…DIA#14182)

Follow‑up on the migration to remove anonymous access from Artifactory.

Please migrate the pre‑merge scripts to access the new Maven Artifactory
using a user and token.

The test script spark-premerge-build.sh →
[hybrid-execution.sh](https://github.com/NVIDIA/spark-rapids/blob/main/jenkins/hybrid_execution.sh#L41-L43)
downloads JARs from the Artifactory repository via `wget`.

Provide the .netrc credentials for downloading JARs from the Artifactory
repository.

Note: The pre‑merge CI already covers tests for this change.

---------

Signed-off-by: Tim Liu <timl@nvidia.com>
)

Fixes NVIDIA#7520

### Description
calls to JNI utility to do overflow check for round/bround operators
Added cases for overflow check for byte/short/int/long types.
Note: Only Spark 340+ supports ANSI for round/bround.

### depends on
* NVIDIA/spark-rapids-jni#4174

### Checklists
- [ ] This PR has added documentation for new or modified features or
behaviors.
- [x] This PR has added new tests or modified existing tests to cover
new code paths.
- [ ] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

---------

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Fixes NVIDIA#13389.

This commit adds support for [Iceberg's "identity" partition
transform](https://iceberg.apache.org/spec/#partition-transforms). This
allows for an Iceberg table to be partitioned on a column's values, with
no modification to the column's row values.

The implementation is trivial. In the interest of not increasing the
test runtime too much, a sampling of column types have been included in
the coverage tests.

---------

Signed-off-by: MithunR <mithunr@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
Another test case to use RelationalGroupedDataset.toString to expect the
correct type string.
But in JDK11+ it returns empty string.
Created testRapids cases by checking JDK version to change the expected
string.
Close NVIDIA#14188

---------

Signed-off-by: Gary Shen <gashen@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…4189)

Fixes NVIDIA#14037.

### Description
Cherry-pick code from Spark.
BroadcastExchangeExec after materialized won't be materialized again, so
we should not reset the metrics.
Cherry-pick to `GpuBroadcastExchangeExecBase`.
The apache/spark@a823f95c522 targets Spark
4.1, because it's a common fix,
we do not only add this to 411 shim, the change applies to all Spark
versions.

### Checklists

- [ ] This PR has added documentation for new or modified features or
behaviors.
- [ ] This PR has added new tests or modified existing tests to cover
new code paths.
      It's only related to metrics, so do not impact any feature.
- [ ] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

Signed-off-by: Chong Gao <res_life@163.com>
Co-authored-by: Chong Gao <res_life@163.com>
)

close NVIDIA#14099

According to the debugging logs as below, the root cause is
`GpuProjectExec` was trying to produce a string column(“payload#3”) with
too large size (~12.8G), even GPU will split it into 5 parts by the
pre-split in `GpuProjectExec`, each still has ~2.56G(=12.8/5), larger
than the size limit(2G) of cudf column that requires an offset buffer.
The pre-split computation only takes care of the total output size,
ignoring the individual column size limit.
```
===> got 1073741760 bytes for out column of expr: input[0, bigint, true](join_key#2)
===> got 13743894532 bytes for out column of expr: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AS payload#3
==> got 5 splits for output size: 14817636292, split unit size: 3.221225472E9
```

So this PR improves the pre-split to take the column limit into account
when calculating the splits number.

Verfied this PR by the case in the linked issue locally and it can fix
the "CUDF String column overflow".

NOTE, this PR can only fix the literal case mentioned in the linked
issue. To also fix non-literal cases for GpuProjectExec, we need to
address the issue NVIDIA#14191.

---------

Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: nvauto <70000568+nvauto@users.noreply.github.com>
Fixes NVIDIA#14196


### Description

We already supported identity transform in iceberg, so we should remove
all fallback tests for identity transform. This pr continues
NVIDIA#14183 to cleanup those
tests.

### Checklists


- [ ] This PR has added documentation for new or modified features or
behaviors.
- [x] This PR has added new tests or modified existing tests to cover
new code paths.
(Please explain in the PR description how the new code paths are tested,
such as names of the new/existing tests that cover them.)
- [ ] Performance testing has been performed and its results are added
in the PR description. Or, an issue has been filed with a link in the PR
description.

---------

Signed-off-by: Ubuntu <ubuntu@ip-172-31-50-247.us-west-2.compute.internal>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-50-247.us-west-2.compute.internal>
This commit includes the following changes:

- Upgrade Scala 2.13 from version 2.13.14 to 2.13.18
- Modernize compiler warning flags by replacing the deprecated -Ywarn-unused:locals,patvars,privates
  with more granular -Wconf and -Wunused syntax for better control over unused code detection
- Remove unused imports across Delta Lake and SQL plugin files identified by stricter compiler settings
- Simplify Scala 2.13 build profile handling in buildall script by consolidating POM file selection
  and removing redundant profile-specific version collection logic
- Update documentation references from "unshimmed-common-from-spark320.txt" to
  "unshimmed-common-from-single-shim.txt" to reflect generalized shim naming
- Add --scala213 command-line option to buildall for explicit Scala 2.13 builds

Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@gerashegalov gerashegalov changed the title Gera jdk8 Upgrade to Scala 2.13.18 and modernize unused warnings configuration Jan 27, 2026
@res-life
Copy link
Owner

Only has Squash and merge option, I'll merge this PR mannully. Please do not use the Squash and merge.
I want to retain the commits without Squash.

@res-life res-life merged commit 0d5157a into res-life:spark-41-shim Jan 28, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.