[GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation#11687

Merged

zhouyuan merged 1 commit intoapache:mainfrom

PHILO-HE:cleanup-shim-api

Mar 5, 2026

Member

PHILO-HE commented Mar 3, 2026 •

edited

Loading

What changes are proposed in this pull request?

Since Spark 3.2 has been dropped, we need to clean up those shims APIs which were introduced to fix Spark code differences between Spark 3.2 and later versions. Then, the implementation for those APIs can be moved to the caller side.

getDistribution
convertPartitionTransforms
getTextScan
bloomFilterExpressionMappings
newBloomFilterAggregate
newMightContain
replaceBloomFilterAggregate
replaceMightContain
getShuffleReaderParam
getPartitionId
supportDuplicateReadingTracking
getFileSizeAndModificationTime
dateTimestampFormatInReadIsDefaultValue
genDecimalRoundExpressionOutput

How was this patch tested?

Local build.

Was this patch authored or co-authored using generative AI tooling?

No.

Related issue: #11379

github-actions bot added the CORE label

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

github-actions bot added VELOX CLICKHOUSE labels

PHILO-HE force-pushed the cleanup-shim-api branch 2 times, most recently from 594e5be to a4ce1dd Compare

March 3, 2026 13:32

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

1 similar comment

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

PHILO-HE force-pushed the cleanup-shim-api branch from a4ce1dd to 04a00ef Compare

March 3, 2026 13:58

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86


          Initial

ac2e71e

PHILO-HE force-pushed the cleanup-shim-api branch from 04a00ef to ac2e71e Compare

March 4, 2026 00:45

github-actions bot commented Mar 4, 2026

Run Gluten Clickhouse CI on x86

PHILO-HE changed the title ~~[CORE] Clean up Spark shims APIs following Spark 3.2 deprecation~~ [GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation

Member Author

PHILO-HE commented Mar 4, 2026

@QCLyu, this is another PR to clean up some code after Spark 3.2 deprecation. Could you take a look? cc @zhouyuan

QCLyu reviewed

View reviewed changes

backends-velox/src/main/scala/org/apache/gluten/extension/ArrowConvertorRule.scala

-                  SparkShimLoader.getSparkShims.dateTimestampFormatInReadIsDefaultValue(csvOptions, timeZone)
+                  csvOptions.dateFormatInRead == default.dateFormatInRead &&
+                  csvOptions.timestampFormatInRead == default.timestampFormatInRead &&
+                  csvOptions.timestampNTZFormatInRead == default.timestampNTZFormatInRead

Contributor

QCLyu Mar 5, 2026

We need to confirm timestampNTZFormatInRead exists in Spark 3.3's CSVOptions

Member Author

PHILO-HE Mar 5, 2026

Just confirmed in source code. It exists. And the compilation also help ensures this. Thanks.

QCLyu reviewed

View reviewed changes

shims/common/src/main/scala/org/apache/gluten/sql/shims/SparkShims.scala

                     schema: MessageType,
                     caseSensitive: Option[Boolean] = None): ParquetFilters
-                def genDecimalRoundExpressionOutput(decimalType: DecimalType, toScale: Int): DecimalType = {

Contributor

QCLyu Mar 5, 2026

We need to ensure the base trait's default implementation handles all Spark versions correctly, or that the method is also removed from the SparkPlanExecApi trait

Member Author

PHILO-HE Mar 5, 2026

The method here is identical with the one in SparkPlanExecApi, so I think this one is not required to be called for overriding the one in SparkPlanExecApi. Let's just remove this one. Thanks.

QCLyu reviewed

View reviewed changes

gluten-substrait/src/main/scala/org/apache/spark/shuffle/GlutenShuffleUtils.scala

               import org.apache.gluten.vectorized.NativePartitioning
-              import org.apache.spark.{SparkConf, TaskContext}
+              import org.apache.spark.{ShuffleUtils, SparkConf, TaskContext}

Contributor

QCLyu Mar 5, 2026

This ShuffleUtils class should be in the shims (it exists in shims/spark34/ etc.). We need to confirm it's available for Spark 3.3 as well

Member Author

PHILO-HE Mar 5, 2026

Yes, it also exists in shims/spark33.

QCLyu approved these changes

View reviewed changes

Contributor

QCLyu left a comment

Looks Good: correctly removed Spark shim indirections that were only needed for Spark 3.2 compatibility. Just left a few in-line comments for further confirmation.

Member Author

PHILO-HE commented Mar 5, 2026

@QCLyu, thanks for the review. @zhouyuan, do you have any comment?

jinchengchenghh approved these changes

View reviewed changes

Contributor

jinchengchenghh left a comment

Thanks!

zhouyuan approved these changes

View reviewed changes

Member

zhouyuan left a comment

👍 Thanks. The shim layer looks cleaner now.

zhouyuan reviewed

View reviewed changes

shims/common/src/main/scala/org/apache/gluten/sql/shims/SparkShims.scala


		def convertPartitionTransforms(partitions: Seq[Transform]): (Seq[String], Option[BucketSpec])

		def generateFileScanRDD(

Member

zhouyuan Mar 5, 2026

We may better add a note on when/why these shim APIs are introduced for future changes

zhouyuan merged commit 3934523 into apache:main

113 of 114 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE VELOX