Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
81e0b00
Initial Spark 410 shim setup
Jan 9, 2026
bf8f71e
Add Spark 410 base shim files
Jan 9, 2026
a39bfeb
Add 410 markers to existing shims and fix API changes for Spark 410
Jan 9, 2026
dde1282
Fix StoragePartitionJoinParams package moved for Spark 410 shim
Jan 9, 2026
5c1453b
Fix MAX_BROADCAST_TABLE_BYTES removed for Spark 410 shim
Jan 9, 2026
d2eed06
Fix TimeAdd renamed to TimestampAddInterval for Spark 410 shim
Jan 9, 2026
fb1fb49
Fix evalMode access changed to evalContext.evalMode for Spark 410 shim
Jan 9, 2026
f6d99c6
Fix ShowNamespacesExec removed for Spark 410 shim
Jan 9, 2026
89039af
Fix AggregateInPandasExec renamed to ArrowAggregatePythonExec for Spa…
Jan 9, 2026
cdaa361
Fix WindowInPandasExec renamed to ArrowWindowPythonExec for Spark 410…
Jan 9, 2026
deda01a
Fix ParquetColumnVector constructor changed for Spark 410 shim
Jan 9, 2026
10e3029
Fix FileStreamSink/MetadataLogFileIndex package moved for Spark 410 shim
Jan 9, 2026
387f54f
Add test files and misc fixes for Spark 410 shim
Jan 9, 2026
5e0e942
Exclude Delta Lake from Spark 4.1.0 build
Jan 9, 2026
cb90f65
Add InvalidateCacheShims for AtomicReplaceTableAsSelectExec callback …
Jan 9, 2026
421ae8a
Add generated files for Spark 4.1.0 shim
Jan 9, 2026
97721ec
Format code
Jan 12, 2026
bf5c7e7
Fix one line
Jan 12, 2026
4dab1d8
Fix unit test cases
Jan 12, 2026
de7311f
Fix unit test cases
Jan 12, 2026
c4d6dee
Fix building errors for Scala 2.12
Jan 13, 2026
352ef8d
Switch from Spark 410 shim to 411
Jan 13, 2026
479ded4
Use Java 17 release
Jan 13, 2026
855edbc
Fix: Change version from 410 to 411
Jan 13, 2026
33c45ec
Fix
Jan 13, 2026
2069825
Fix
Jan 13, 2026
7df38c8
Fix
Jan 13, 2026
b5cb210
Fix
Jan 13, 2026
4041e20
Fix ITs: Spark 4.1.0+ returns bytes instead of bytearray for binary data
Jan 13, 2026
4b16f38
Format code
Jan 13, 2026
0c8dd92
Update pom
Jan 14, 2026
f344b51
Merge main branch
Jan 14, 2026
6a6efdf
411 docs
Jan 14, 2026
6856dfb
Fix DayTimeInterval shims for Spark 4.1.1
Jan 14, 2026
f482917
Add Gpu version for OneRowRelationExec
Jan 14, 2026
a310cad
Fix shim bug: missing for some Spark versions
Jan 15, 2026
4e5fa8d
Fix bug in make-scala-version-build-files.sh
Jan 15, 2026
cd01ef7
Fix shim bug: missed some Spark versions
Jan 15, 2026
a842809
Revert "Fix bug in make-scala-version-build-files.sh"
Jan 15, 2026
b5f2101
Fix make-scala-version-build-files.sh: reorder properties in release4…
Jan 15, 2026
eab7b21
Fix WindowInPandasShims for Databricks: use projectList instead of wi…
Jan 15, 2026
9105d81
Fix import conflict: remove redundant StoragePartitionJoinShims import
Jan 15, 2026
e572077
Fix Window UDF protocol for Spark 4.1.1
Jan 15, 2026
8abd36f
Fix Aggregate UDF protocol for Spark 4.1.1
Jan 15, 2026
955b842
Merge branch 'main' into spark-41-shim
Jan 19, 2026
dba476d
Fix docs
Jan 19, 2026
404a063
Revert a inadvertent change
Jan 19, 2026
5e4bfff
Java8 Target
gerashegalov Jan 24, 2026
9b81311
jdk8 target
gerashegalov Jan 24, 2026
0d8b133
Merge remote-tracking branch 'origin/release/26.02' into gera-jdk8
gerashegalov Jan 27, 2026
0d5157a
Upgrade to Scala 2.13.18 and modernize unused warnings configuration
gerashegalov Jan 27, 2026
c3bd85e
Fix
Jan 28, 2026
d5e36ac
Copyright
Jan 28, 2026
c58da37
Fix building warnnings
Jan 28, 2026
0b8583c
Fix a udf test case using safety mode
Jan 28, 2026
2a85f54
Update POM files and build scripts
gerashegalov Jan 28, 2026
24f37c9
411 shim: get max broadcase table size from conf
Jan 29, 2026
abe75a6
Fix comments
Jan 29, 2026
e2eb481
Refactor build scripts and update dependencies
gerashegalov Jan 29, 2026
b8be19b
Merge remote-tracking branch 'res-life/spark-41-shim' into pr/res-lif…
gerashegalov Jan 29, 2026
991e73e
Update copyright years in multiple files to 2026 and add ParquetVaria…
gerashegalov Jan 29, 2026
d231944
sign
gerashegalov Jan 29, 2026
cc1ae81
Revert inadvertent changes for two doc files
Jan 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
40 changes: 18 additions & 22 deletions build/buildall
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash
#
# Copyright (c) 2021-2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2021-2026, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -38,8 +38,8 @@ function print_usage() {
echo " -gb, --generate-bloop"
echo " generate projects for Bloop clients: IDE (Scala Metals, IntelliJ) or Bloop CLI"
echo " -p=DIST_PROFILE, --profile=DIST_PROFILE"
echo " use this profile for the dist module, default: noSnapshots, also supported: snapshots, minimumFeatureVersionMix,"
echo " snapshotsWithDatabricks, noSnapshotsWithDatabricks, noSnapshotsScala213, snapshotsScala213."
echo " use this profile for the dist module, default: noSnapshots, also supported: snapshots,"
echo " snapshotsWithDatabricks, noSnapshotsWithDatabricks"
echo " NOTE: the Databricks-related spark3XYdb shims are not built locally, the jars are fetched prebuilt from a"
echo " . remote Maven repo. You can also supply a comma-separated list of build versions. E.g., --profile=330,331 will"
echo " build only the distribution jar only for 3.3.0 and 3.3.1"
Expand All @@ -54,6 +54,8 @@ function print_usage() {
echo " use this option to build project with maven. E.g., --option='-Dcudf.version=cuda12'"
echo " --rebuild-dist-only"
echo " repackage the dist module artifact using installed dependencies"
echo " --scala213"
echo " build 2.13 shims"
}

function bloopInstall() {
Expand Down Expand Up @@ -152,7 +154,7 @@ case "$1" in
;;

-o=*|--option=*)
MVN_OPT="${1#*=}"
export MVN_OPT="${1#*=}"
;;

*)
Expand All @@ -172,41 +174,35 @@ if [[ "$DIST_PROFILE" == *Scala213 ]]; then
SCALA213=1
fi

MVN=${MVN:-"mvn"}
# include options to mvn command
export MVN="mvn -Dmaven.wagon.http.retryHandler.count=3 ${MVN_OPT}"
export MVN="$MVN -Dmaven.wagon.http.retryHandler.count=3 ${MVN_OPT}"

if [[ "$SCALA213" == "1" ]]; then
MVN="$MVN -f scala2.13/"
DIST_PROFILE=${DIST_PROFILE:-"noSnapshotsScala213"}
POM_FILE="scala2.13/pom.xml"
export MVN="$MVN -f scala2.13/"
$(dirname $0)/make-scala-version-build-files.sh 2.13
else
DIST_PROFILE=${DIST_PROFILE:-"noSnapshots"}
else
POM_FILE="pom.xml"
fi

DIST_PROFILE=${DIST_PROFILE:-"noSnapshots"}


[[ "$MODULE" != "" ]] && MODULE_OPT="--projects $MODULE --also-make" || MODULE_OPT=""

echo "Collecting Spark versions..."
case $DIST_PROFILE in

snapshotsScala213)
SPARK_SHIM_VERSIONS=($(versionsFromReleaseProfiles "snap_and_no_snap" "scala2.13/pom.xml"))
;;

noSnapshotsScala213)
SPARK_SHIM_VERSIONS=($(versionsFromReleaseProfiles "no_snapshots" "scala2.13/pom.xml"))
;;

snapshots?(WithDatabricks))
SPARK_SHIM_VERSIONS=($(versionsFromReleaseProfiles "snap_and_no_snap" "pom.xml"))
SPARK_SHIM_VERSIONS=($(versionsFromReleaseProfiles "snap_and_no_snap" $POM_FILE))
;;

noSnapshots?(WithDatabricks))
SPARK_SHIM_VERSIONS=($(versionsFromReleaseProfiles "no_snapshots" "pom.xml"))
SPARK_SHIM_VERSIONS=($(versionsFromReleaseProfiles "no_snapshots" $POM_FILE))
;;

minimumFeatureVersionMix)
SPARK_SHIM_VERSIONS=($(versionsFromDistProfile "minimumFeatureVersionMix"))
;;


[34]*)
<<< $DIST_PROFILE IFS="," read -ra SPARK_SHIM_VERSIONS
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2024-2025, NVIDIA CORPORATION.
* Copyright (c) 2024-2026, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,6 +17,7 @@
/*** spark-rapids-shim-json-lines
{"spark": "400"}
{"spark": "401"}
{"spark": "411"}
spark-rapids-shim-json-lines ***/
package org.apache.spark.sql.tests.datagen

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
* Copyright (c) 2025-2026, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -20,6 +20,7 @@ import com.nvidia.spark.rapids._
import com.nvidia.spark.rapids.RapidsPluginImplicits._
import com.nvidia.spark.rapids.delta.{DeltaIOProvider, GpuDeltaDataSource, RapidsDeltaUtils}
import com.nvidia.spark.rapids.shims._
import com.nvidia.spark.rapids.shims.InvalidateCacheShims
import org.apache.hadoop.fs.Path

import org.apache.spark.sql.SparkSession
Expand Down Expand Up @@ -132,7 +133,7 @@ abstract class DeltaProviderBase extends DeltaIOProvider {
cpuExec.tableSpec,
cpuExec.writeOptions,
cpuExec.orCreate,
cpuExec.invalidateCache)
InvalidateCacheShims.getInvalidateCache(cpuExec.invalidateCache))
Copy link
Collaborator

@firestarman firestarman Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this file be used for 411 building ?
According to the folder name "delta-33x-40x", suppose it is not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, it's does used for 411 shim for delta lake feature.
But now delta lake for 411 shim is using a stub/fake impl. Track issue #14119.
Let's keep this? it has no side-effect.

}


Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
* Copyright (c) 2025-2026, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,7 +17,6 @@
package com.nvidia.spark.rapids.delta.common

import ai.rapids.cudf._
import ai.rapids.cudf.HostColumnVector._
import com.nvidia.spark.rapids._
import com.nvidia.spark.rapids.Arm.withResource
import com.nvidia.spark.rapids.RapidsPluginImplicits._
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
* Copyright (c) 2025-2026, NVIDIA CORPORATION.
*
* This file was derived from OptimizeWriteExchange.scala
* in the Delta Lake project at https://github.com/delta-io/delta
Expand All @@ -26,8 +26,7 @@ import scala.concurrent.Future
import scala.concurrent.duration.Duration

import com.nvidia.spark.rapids.{GpuColumnarBatchSerializer, GpuExec, GpuMetric, GpuPartitioning, GpuRoundRobinPartitioning, RapidsConf}
import com.nvidia.spark.rapids.GpuMetric.{OP_TIME_NEW_SHUFFLE_READ, OP_TIME_NEW_SHUFFLE_WRITE}
import com.nvidia.spark.rapids.GpuMetric.{DESCRIPTION_OP_TIME_NEW_SHUFFLE_READ, DESCRIPTION_OP_TIME_NEW_SHUFFLE_WRITE, MODERATE_LEVEL}
import com.nvidia.spark.rapids.GpuMetric._
import com.nvidia.spark.rapids.delta.RapidsDeltaSQLConf
import com.nvidia.spark.rapids.shims.GpuHashPartitioning

Expand Down Expand Up @@ -60,7 +59,6 @@ case class GpuOptimizeWriteExchangeExec(
partitioning: GpuPartitioning,
override val child: SparkPlan,
@transient deltaLog: DeltaLog) extends Exchange with GpuExec with DeltaLogging {
import GpuMetric._

// Use 150% of target file size hint config considering parquet compression.
// Still the result file can be smaller/larger than the config due to data skew or
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
* Copyright (c) 2025-2026, NVIDIA CORPORATION.
*
* This file was derived from OptimisticTransaction.scala and TransactionalWrite.scala
* in the Delta Lake project at https://github.com/delta-io/delta.
Expand All @@ -23,11 +23,8 @@ package org.apache.spark.sql.delta.hooks

import org.apache.spark.internal.MDC
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.catalog.CatalogTable
import org.apache.spark.sql.delta._
import org.apache.spark.sql.delta.actions._
import org.apache.spark.sql.delta.commands.DeltaOptimizeContext
import org.apache.spark.sql.delta.commands.optimize._
import org.apache.spark.sql.delta.logging.DeltaLogKeys
import org.apache.spark.sql.delta.rapids.GpuOptimisticTransactionBase
import org.apache.spark.sql.delta.stats.AutoCompactPartitionStats
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2022-2025, NVIDIA CORPORATION.
* Copyright (c) 2022-2026, NVIDIA CORPORATION.
*
* This file was derived from WriteIntoDelta.scala
* in the Delta Lake project at https://github.com/delta-io/delta.
Expand Down Expand Up @@ -45,8 +45,6 @@ import org.apache.spark.sql.execution.datasources.LogicalRelation
import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
import org.apache.spark.sql.functions.{array, col, explode, lit, struct}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.nvidia.DFUDFShims
import org.apache.spark.sql.rapids.shims.TrampolineConnectShims.SparkSession
import org.apache.spark.sql.types.StructType

/** GPU version of Delta Lake's WriteIntoDelta. */
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
* Copyright (c) 2025-2026, NVIDIA CORPORATION.
*
* This file was derived from MergeIntoCommand.scala
* in the Delta Lake project at https://github.com/delta-io/delta.
Expand Down Expand Up @@ -30,11 +30,11 @@ import com.nvidia.spark.rapids.RapidsConf
import com.nvidia.spark.rapids.delta._

import org.apache.spark.SparkContext
import org.apache.spark.sql.{DataFrame, Row, SparkSession => SqlSparkSession}
import org.apache.spark.sql.{Row, SparkSession => SqlSparkSession}
import org.apache.spark.sql.catalyst.catalog.CatalogTable
import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, Expression, Literal, Or}
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.classic.{ColumnNodeToExpressionConverter, ExpressionUtils, SparkSession => ClassicSparkSession}
import org.apache.spark.sql.classic.{SparkSession => ClassicSparkSession}
import org.apache.spark.sql.delta._
import org.apache.spark.sql.delta.actions.{AddFile, FileAction}
import org.apache.spark.sql.delta.commands.MergeIntoCommandBase
Expand Down Expand Up @@ -384,7 +384,7 @@ case class GpuMergeIntoCommand(
}
}
commitAndRecordStats(
org.apache.spark.sql.classic.SparkSession.active,
ClassicSparkSession.active,
gpuDeltaTxn,
mergeActions,
startTime,
Expand Down Expand Up @@ -583,7 +583,6 @@ case class GpuMergeIntoCommand(
val matchedRowCounts = collectTouchedFiles.groupBy(ROW_ID_COL).agg(sum("one").as("count"))

// Get multiple matches and simultaneously collect (using touchedFilesAccum) the file names
import org.apache.spark.sql.delta.implicits._
val mmRow = matchedRowCounts
.filter(col("count") > lit(1))
.select(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
* Copyright (c) 2025-2026, NVIDIA CORPORATION.
*
* This file was derived from OptimisticTransaction.scala and TransactionalWrite.scala
* in the Delta Lake project at https://github.com/delta-io/delta.
Expand Down Expand Up @@ -32,7 +32,7 @@ import org.apache.spark.sql.{DataFrame, Dataset}
import org.apache.spark.sql.{SparkSession => SqlSparkSession}
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.catalog.CatalogTable
import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression, RuntimeReplaceable}
import org.apache.spark.sql.catalyst.expressions.{Attribute, Expression}
import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
import org.apache.spark.sql.delta._
import org.apache.spark.sql.delta.actions.{AddFile, FileAction}
Expand All @@ -47,7 +47,6 @@ import org.apache.spark.sql.execution.metric.SQLMetric
import org.apache.spark.sql.functions.to_json
import org.apache.spark.sql.rapids.{BasicColumnarWriteJobStatsTracker, ColumnarWriteJobStatsTracker, GpuWriteJobStatsTracker}
import org.apache.spark.sql.rapids.delta.GpuIdentityColumn
import org.apache.spark.sql.rapids.shims.TrampolineConnectShims
import org.apache.spark.sql.rapids.shims.TrampolineConnectShims.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.vectorized.ColumnarBatch
Expand Down
2 changes: 1 addition & 1 deletion dist/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ for each version of Spark supported in the jar, i.e., spark330/, spark341/, etc.

If you have to change the contents of the uber jar the following files control what goes into the base jar as classes that are not shaded.

1. `unshimmed-common-from-spark320.txt` - This has classes and files that should go into the base jar with their normal
1. `unshimmed-common-from-single-shim.txt` - This has classes and files that should go into the base jar with their normal
package name (not shaded). This includes user visible classes (i.e., com/nvidia/spark/SQLPlugin), python files,
and other files that aren't version specific. Uses Spark 3.2.0 built jar for these base classes as explained above.
2. `unshimmed-from-each-spark3xx.txt` - This is applied to all the individual Spark specific version jars to pull
Expand Down
11 changes: 6 additions & 5 deletions dist/build/package-parallel-worlds.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023-2024, NVIDIA CORPORATION.
# Copyright (c) 2023-2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -29,6 +29,7 @@ def shell_exec(shell_cmd):
artifacts = attributes.get('artifact_csv').split(',')
buildver_list = re.sub(r'\s+', '', project.getProperty('included_buildvers'),
flags=re.UNICODE).split(',')
buildver_list = sorted(buildver_list, reverse=True)
source_basedir = project.getProperty('spark.rapids.source.basedir')
project_basedir = project.getProperty('spark.rapids.project.basedir')
project_version = project.getProperty('project.version')
Expand Down Expand Up @@ -73,22 +74,22 @@ def shell_exec(shell_cmd):
shell_exec(mvn_cmd)

dist_dir = os.sep.join([source_basedir, 'dist'])
with open(os.sep.join([dist_dir, 'unshimmed-common-from-spark320.txt']), 'r') as f:
from_spark320 = f.read().splitlines()
with open(os.sep.join([dist_dir, 'unshimmed-common-from-single-shim.txt']), 'r') as f:
from_single_shim = f.read().splitlines()
with open(os.sep.join([dist_dir, 'unshimmed-from-each-spark3xx.txt']), 'r') as f:
from_each = f.read().splitlines()
with zipfile.ZipFile(os.sep.join([deps_dir, art_jar]), 'r') as zip_handle:
if project.getProperty('should.build.conventional.jar'):
zip_handle.extractall(path=top_dist_jar_dir)
else:
zip_handle.extractall(path=os.sep.join([top_dist_jar_dir, classifier]))
# IMPORTANT unconditional extract from first to the top
# IMPORTANT unconditional extract from the highest Spark version to the top
if bv == buildver_list[0] and art == 'sql-plugin-api':
zip_handle.extractall(path=top_dist_jar_dir)
# TODO deprecate
namelist = zip_handle.namelist()
matching_members = []
glob_list = from_spark320 + from_each if bv == buildver_list[0] else from_each
glob_list = from_single_shim + from_each if bv == buildver_list[0] else from_each
for pat in glob_list:
new_matches = fnmatch.filter(namelist, pat)
matching_members += new_matches
Expand Down
4 changes: 2 additions & 2 deletions dist/maven-antrun/build-parallel-worlds.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0"?>
<!--
Copyright (c) 2021-2024, NVIDIA CORPORATION.
Copyright (c) 2021-2026, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -132,7 +132,7 @@
<!-- Remove the explicily unshimmed files from the common directory -->
<delete>
<fileset dir="${project.build.directory}/parallel-world/spark-shared"
includesfile="${spark.rapids.source.basedir}/${rapids.module}/unshimmed-common-from-spark320.txt"/>
includesfile="${spark.rapids.source.basedir}/${rapids.module}/unshimmed-common-from-single-shim.txt"/>
</delete>
</target>
<target name="remove-dependencies-from-pom" depends="build-parallel-worlds">
Expand Down
15 changes: 8 additions & 7 deletions dist/scripts/binary-dedupe.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright (c) 2021-2025, NVIDIA CORPORATION.
# Copyright (c) 2021-2026, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -85,9 +85,6 @@ function retain_single_copy() {

package_class_parts=("${path_parts[@]:2}")

package_len=$((${#package_class_parts[@]} - 1))
package_parts=("${package_class_parts[@]::$package_len}")

package_class_with_spaces="${package_class_parts[*]}"
# com/nvidia/spark/udf/Repr\$UnknownCapturedArg\$.class
package_class="${package_class_with_spaces// //}"
Expand Down Expand Up @@ -164,12 +161,16 @@ function verify_same_sha_for_unshimmed() {
# sha1 look up if there is an entry with the unshimmed class as a suffix

class_file_quoted=$(printf '%q' "$class_file")

# TODO currently RapidsShuffleManager is "removed" from /spark* by construction in
# dist pom.xml via ant. We could delegate this logic to this script
# and make both simmpler
if [[ ! "$class_file_quoted" =~ com/nvidia/spark/rapids/spark[34].*/.*ShuffleManager.class ]]; then

#
# TODO ParquetCachedBatchSerializer is not bitwise-identical after 411,
# but it is compatible with previous versions because it merely adds a new method.
# we might need to replace this strict check with MiMa
# https://github.com/apache/spark/blob/7011706a0a8dbec6adb5b5b121921b29b314335f/sql/core/src/main/scala/org/apache/spark/sql/columnar/CachedBatchSerializer.scala#L75-L95
if [[ ! "$class_file_quoted" =~ com/nvidia/spark/rapids/spark[34].*/.*ShuffleManager.class && \
"$class_file_quoted" != "com/nvidia/spark/ParquetCachedBatchSerializer.class" ]]; then
if ! grep -q "/spark.\+/$class_file_quoted" "$SPARK_SHARED_TXT"; then
echo >&2 "$class_file is not bitwise-identical across shims"
exit 255
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
META-INF/DEPENDENCIES
META-INF/LICENSE
META-INF/NOTICE
com/nvidia/spark/GpuCachedBatchSerializer*
com/nvidia/spark/ParquetCachedBatchSerializer*
com/nvidia/spark/rapids/ExplainPlan.class
com/nvidia/spark/rapids/ExplainPlan$.class
com/nvidia/spark/rapids/ExplainPlanBase.class
Expand Down
1 change: 1 addition & 0 deletions integration_tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pandas
pyarrow == 17.0.0 ; python_version == '3.8'
pyarrow == 19.0.1 ; python_version >= '3.9'
pytest-xdist >= 2.0.0
pytz
findspark
fsspec == 2025.3.0
fastparquet == 2024.5.0 ; python_version >= '3.9'
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/run_pyspark_from_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ else
# PySpark uses ".dev0" for "-SNAPSHOT" and either ".dev" for "preview" or ".devN" for "previewN"
# https://github.com/apache/spark/blob/66f25e314032d562567620806057fcecc8b71f08/dev/create-release/release-build.sh#L267
VERSION_STRING=$(PYTHONPATH=${SPARK_HOME}/python:${PY4J_FILE} python -c \
"import pyspark, re; print(re.sub('\.dev[012]?$', '', pyspark.__version__))"
"import pyspark, re; print(re.sub(r'\.dev[012]?$', '', pyspark.__version__))"
)
SCALA_VERSION=`$SPARK_HOME/bin/pyspark --version 2>&1| grep Scala | awk '{split($4,v,"."); printf "%s.%s", v[1], v[2]}'`

Expand Down
Loading