[GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation#11687
[GLUTEN-11379][CORE] Clean up Spark shims APIs following Spark 3.2 deprecation#11687zhouyuan merged 1 commit intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
594e5be to
a4ce1dd
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
a4ce1dd to
04a00ef
Compare
|
Run Gluten Clickhouse CI on x86 |
04a00ef to
ac2e71e
Compare
|
Run Gluten Clickhouse CI on x86 |
| SparkShimLoader.getSparkShims.dateTimestampFormatInReadIsDefaultValue(csvOptions, timeZone) | ||
| csvOptions.dateFormatInRead == default.dateFormatInRead && | ||
| csvOptions.timestampFormatInRead == default.timestampFormatInRead && | ||
| csvOptions.timestampNTZFormatInRead == default.timestampNTZFormatInRead |
There was a problem hiding this comment.
We need to confirm timestampNTZFormatInRead exists in Spark 3.3's CSVOptions
There was a problem hiding this comment.
Just confirmed in source code. It exists. And the compilation also help ensures this. Thanks.
| schema: MessageType, | ||
| caseSensitive: Option[Boolean] = None): ParquetFilters | ||
|
|
||
| def genDecimalRoundExpressionOutput(decimalType: DecimalType, toScale: Int): DecimalType = { |
There was a problem hiding this comment.
We need to ensure the base trait's default implementation handles all Spark versions correctly, or that the method is also removed from the SparkPlanExecApi trait
There was a problem hiding this comment.
The method here is identical with the one in SparkPlanExecApi, so I think this one is not required to be called for overriding the one in SparkPlanExecApi. Let's just remove this one. Thanks.
| import org.apache.gluten.vectorized.NativePartitioning | ||
|
|
||
| import org.apache.spark.{SparkConf, TaskContext} | ||
| import org.apache.spark.{ShuffleUtils, SparkConf, TaskContext} |
There was a problem hiding this comment.
This ShuffleUtils class should be in the shims (it exists in shims/spark34/ etc.). We need to confirm it's available for Spark 3.3 as well
There was a problem hiding this comment.
Yes, it also exists in shims/spark33.
QCLyu
left a comment
There was a problem hiding this comment.
Looks Good: correctly removed Spark shim indirections that were only needed for Spark 3.2 compatibility. Just left a few in-line comments for further confirmation.
zhouyuan
left a comment
There was a problem hiding this comment.
👍 Thanks. The shim layer looks cleaner now.
|
|
||
| def convertPartitionTransforms(partitions: Seq[Transform]): (Seq[String], Option[BucketSpec]) | ||
|
|
||
| def generateFileScanRDD( |
There was a problem hiding this comment.
We may better add a note on when/why these shim APIs are introduced for future changes
What changes are proposed in this pull request?
Since Spark 3.2 has been dropped, we need to clean up those shims APIs which were introduced to fix Spark code differences between Spark 3.2 and later versions. Then, the implementation for those APIs can be moved to the caller side.
How was this patch tested?
Local build.
Was this patch authored or co-authored using generative AI tooling?
No.
Related issue: #11379