Skip to content

[GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation#11687

Merged
zhouyuan merged 1 commit intoapache:mainfrom
PHILO-HE:cleanup-shim-api
Mar 5, 2026
Merged

[GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation#11687
zhouyuan merged 1 commit intoapache:mainfrom
PHILO-HE:cleanup-shim-api

Conversation

@PHILO-HE
Copy link
Member

@PHILO-HE PHILO-HE commented Mar 3, 2026

What changes are proposed in this pull request?

Since Spark 3.2 has been dropped, we need to clean up those shims APIs which were introduced to fix Spark code differences between Spark 3.2 and later versions. Then, the implementation for those APIs can be moved to the caller side.

getDistribution
convertPartitionTransforms
getTextScan
bloomFilterExpressionMappings
newBloomFilterAggregate
newMightContain
replaceBloomFilterAggregate
replaceMightContain
getShuffleReaderParam
getPartitionId
supportDuplicateReadingTracking
getFileSizeAndModificationTime
dateTimestampFormatInReadIsDefaultValue
genDecimalRoundExpressionOutput 

How was this patch tested?

Local build.

Was this patch authored or co-authored using generative AI tooling?

No.

Related issue: #11379

@github-actions github-actions bot added the CORE works for Gluten Core label Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

@PHILO-HE PHILO-HE force-pushed the cleanup-shim-api branch 2 times, most recently from 594e5be to a4ce1dd Compare March 3, 2026 13:32
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Run Gluten Clickhouse CI on x86

@PHILO-HE PHILO-HE changed the title [CORE] Clean up Spark shims APIs following Spark 3.2 deprecation [GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation Mar 4, 2026
@PHILO-HE
Copy link
Member Author

PHILO-HE commented Mar 4, 2026

@QCLyu, this is another PR to clean up some code after Spark 3.2 deprecation. Could you take a look? cc @zhouyuan

SparkShimLoader.getSparkShims.dateTimestampFormatInReadIsDefaultValue(csvOptions, timeZone)
csvOptions.dateFormatInRead == default.dateFormatInRead &&
csvOptions.timestampFormatInRead == default.timestampFormatInRead &&
csvOptions.timestampNTZFormatInRead == default.timestampNTZFormatInRead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to confirm timestampNTZFormatInRead exists in Spark 3.3's CSVOptions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirmed in source code. It exists. And the compilation also help ensures this. Thanks.

schema: MessageType,
caseSensitive: Option[Boolean] = None): ParquetFilters

def genDecimalRoundExpressionOutput(decimalType: DecimalType, toScale: Int): DecimalType = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to ensure the base trait's default implementation handles all Spark versions correctly, or that the method is also removed from the SparkPlanExecApi trait

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method here is identical with the one in SparkPlanExecApi, so I think this one is not required to be called for overriding the one in SparkPlanExecApi. Let's just remove this one. Thanks.

import org.apache.gluten.vectorized.NativePartitioning

import org.apache.spark.{SparkConf, TaskContext}
import org.apache.spark.{ShuffleUtils, SparkConf, TaskContext}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ShuffleUtils class should be in the shims (it exists in shims/spark34/ etc.). We need to confirm it's available for Spark 3.3 as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it also exists in shims/spark33.

Copy link
Contributor

@QCLyu QCLyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good: correctly removed Spark shim indirections that were only needed for Spark 3.2 compatibility. Just left a few in-line comments for further confirmation.

@PHILO-HE
Copy link
Member Author

PHILO-HE commented Mar 5, 2026

@QCLyu, thanks for the review. @zhouyuan, do you have any comment?

Copy link
Contributor

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks. The shim layer looks cleaner now.


def convertPartitionTransforms(partitions: Seq[Transform]): (Seq[String], Option[BucketSpec])

def generateFileScanRDD(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may better add a note on when/why these shim APIs are introduced for future changes

@zhouyuan zhouyuan merged commit 3934523 into apache:main Mar 5, 2026
113 of 114 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants