Skip to content

[FEA] Add support for Spark 4.1.1 [databricks]#14120

Merged
gerashegalov merged 63 commits intoNVIDIA:release/26.02from
res-life:spark-41-shim
Jan 30, 2026
Merged

[FEA] Add support for Spark 4.1.1 [databricks]#14120
gerashegalov merged 63 commits intoNVIDIA:release/26.02from
res-life:spark-41-shim

Conversation

@res-life
Copy link
Collaborator

@res-life res-life commented Jan 9, 2026

closes #14056
closes #14105
closes #14104
closes #14107
closes #14150
closes #14036
closes #14111
closes #14103
closes #14112
closes #14113
closes #14114
closes #14115

Description

Adds initial support for Spark 4.1.1 shim with the following API changes handled:

API Changes in Spark 4.1.1

  1. StoragePartitionJoinParams package change - Moved from org.apache.spark.sql.execution.datasources.v2 to org.apache.spark.sql.execution.joins

  2. MAX_BROADCAST_TABLE_BYTES removal - Constant removed from BroadcastExchangeExec, now configurable via conf.maxBroadcastTableSizeInBytes

  3. WindowInPandasExec renamed - Renamed to ArrowWindowPythonExec

  4. TimeAdd renamed - Renamed to TimestampAddInterval

  5. FileStreamSink/MetadataLogFileIndex package change - Moved to org.apache.spark.sql.execution.streaming.sinks and org.apache.spark.sql.execution.streaming.runtime

  6. ParquetColumnVector constructor change - Removed memoryMode parameter

  7. SQLConf.getConf return type change - Changed from String to Enum for certain configurations

  8. ExpressionWithRandomSeed trait addition - Added withShiftedSeed method requirement

  9. SpecializedGetters trait additions - Added getGeography and getGeometry methods

  10. AtomicReplaceTableAsSelectExec.invalidateCache callback change - Changed from 2-arg to 3-arg signature

  11. Add Gpu version for OneRowRelationExec - New Exec

Delta Lake Status

Delta Lake support is excluded for Spark 4.1.1 because io.delta:delta-spark is not yet compatible with Spark 4.1.1 (CheckpointFileManager moved packages). Using delta-stub instead.

See: #14119

Testing

  • Build passes: mvn clean package -f scala2.13/ -DskipTests -Dbuildver=410 -T18
  • Unit tests
  • Integration tests

Checklists

  • This PR has added documentation for new or modified features or
    behaviors.
  • This PR has added new tests or modified existing tests to cover
    new code paths.
    (Please explain in the PR description how the new code paths are tested,
    such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added
    in the PR description. Or, an issue has been filed with a link in the PR
    description.

Chong Gao added 16 commits January 9, 2026 14:57
Use delta-stub instead of delta-40x for Spark 4.1.0 because
io.delta:delta-spark is not yet compatible with Spark 4.1.0.
CheckpointFileManager moved packages in Spark 4.1.0.

Contributes to NVIDIA#14119
…change

In Spark 4.1.0, AtomicReplaceTableAsSelectExec.invalidateCache callback
signature changed from (TableCatalog, Identifier) => Unit to
(TableCatalog, Table, Identifier) => Unit.

Create shims to handle this API change:
- spark400/InvalidateCacheShims.scala for Spark 4.0.x (2-arg callback)
- spark410/InvalidateCacheShims.scala for Spark 4.1.0+ (3-arg callback)
- spark410/GpuAtomicReplaceTableAsSelectExec.scala for 4.1.0+ exec

Contributes to NVIDIA#14119
@res-life
Copy link
Collaborator Author

res-life commented Jan 9, 2026

This draft PR aims to make building Spark 410 pass to unblock other tasks for other co-workers.

Notes:

  • Release is Java 17: first use this release, will switch to target 8 later.
  • Please first compile spark-rapids-private repo, also use 17 release.

TODOs:
Switch target 8.
Switch to Spark 4.1.1
Double check all the sub-issues in #14056

@res-life res-life requested review from nartal1 and razajafri January 9, 2026 09:04
@NvTimLiu
Copy link
Collaborator

Are we going to change to support spark-4.1.1 shim instead? @res-life

Uploaded spark-4.1.1 bin to internal artifactory, feel free to trigger the CI for testing when you change is ready, thanks!

Chong Gao added 4 commits January 12, 2026 10:45
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
Signed-off-by: Chong Gao <res_life@163.com>
@nartal1
Copy link
Collaborator

nartal1 commented Jan 12, 2026

Thanks @res-life for putting up the PR. Looking into it.

Scala2.12 builds are failing. Could you please fix these.

Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/GpuBroadcastExchangeExec.scala:36: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/scala/org/apache/spark/sql/rapids/execution/python/GpuWindowInPandasExecBase.scala:37: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/AggregateInPandasExecShims.scala:53: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/AggregateInPandasExecShims.scala:55: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/Spark320PlusShims.scala:208: Cannot prove that (Class[?0], com.nvidia.spark.rapids.ExprRule[_1])( forSome { type ?0 <: org.apache.spark.sql.catalyst.expressions.Expression; type _1 >: org.apache.spark.sql.catalyst.expressions.Abs with org.apache.spark.sql.catalyst.expressions.aggregate.Average with org.apache.spark.sql.catalyst.expressions.Cast <: org.apache.spark.sql.catalyst.expressions.Expression with Serializable with org.apache.spark.sql.catalyst.trees.UnaryLike[org.apache.spark.sql.catalyst.expressions.Expression] }) <:< (T, U).
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/Spark320PlusShims.scala:74: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/Spark320PlusShims.scala:80: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/Spark320PlusShims.scala:84: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/Spark320PlusShims.scala:85: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/WindowInPandasExecShims.scala:55: Unused import
Error: ] /home/runner/work/spark-rapids/spark-rapids/sql-plugin/src/main/spark320/scala/com/nvidia/spark/rapids/shims/WindowInPandasShims.scala:61: value projectList is not a member of org.apache.spark.sql.execution.python.WindowInPandasExec

@res-life
Copy link
Collaborator Author

Yes, Scala 2.12 has some regressions, it's minor.
Let me first pass the Spark 4.1 & Scala 2.13 functionalities, then I'll fix Scala 2.12.
Currently it's a draft, focus on no blocking for other tasks.

Nest steps:
ITs for Spark 4.1 & Scala 2.13
targets Java 8
Make sure no regressions for other Spark versions, like 32x, 33x, 34x, 35x.....

@res-life
Copy link
Collaborator Author

Current status: unit cases passed For Spark 41 & Scala 2.13

Signed-off-by: Chong Gao <res_life@163.com>
Chong Gao and others added 5 commits January 29, 2026 17:46
Signed-off-by: Chong Gao <res_life@163.com>
- Modified the buildall script to ensure the MVN variable is correctly exported with options.
- Moved user-facing ParquetCachedBatchSerializer class to sql-plugin-api.
- Updated integration test requirements to include pytz.
…ntShims setup in ParquetCachedBatchSerializer
Signed-off-by: Gera Shegalov <gshegalov@nvidia.com>
@res-life
Copy link
Collaborator Author

build

Signed-off-by: Chong Gao <res_life@163.com>
@res-life
Copy link
Collaborator Author

build

@gerashegalov gerashegalov merged commit d13479c into NVIDIA:release/26.02 Jan 30, 2026
44 checks passed
@sameerz sameerz added the feature request New feature or request label Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment