Skip to content

[CORE] Remove legacy Spark 3.2 compatibility code#11495

Closed
QCLyu wants to merge 0 commit intoapache:mainfrom
QCLyu:qingchuanlyu
Closed

[CORE] Remove legacy Spark 3.2 compatibility code#11495
QCLyu wants to merge 0 commit intoapache:mainfrom
QCLyu:qingchuanlyu

Conversation

@QCLyu
Copy link
Contributor

@QCLyu QCLyu commented Jan 27, 2026

What changes are proposed in this pull request?

This PR removes remaining Spark 3.2-specific compatibility code from the codebase, completing the Spark 3.2 deprecation.
Changes:

  • Removed lteSpark32 from SparkVersionUtil.scala and updated lteSpark33 to use direct version comparison
  • Removed Spark 3.2-specific code paths from:
    • SparkTaskUtil.scala - Spark 3.2 TaskContext constructor path
    • SparkPlanUtil.scala - Spark 3.2-specific supportsRowBased implementation
    • GlutenCostEvaluator.scala - Spark 3.2-specific CostEvaluator instantiation
    • Convention.scala - Spark 3.2-specific row type handling
  • Removed Spark 3.2 test case from MiscOperatorSuite.scala
  • Deleted entire shims/spark32 directory including:
    • ColumnarArrayShim.java
    • ParquetFooterReaderShim.scala
  • Cleaned up unused imports (SparkVersionUtil, SparkShimLoader, AnalysisException)

The codebase now only supports Spark 3.3 and later versions.

How was this patch tested?

Verified compilation succeeds with all unused imports removed
Existing unit tests should continue to pass (Spark 3.3+ only)
Manual verification that no references to lteSpark32 or Spark 3.2-specific code remain in the codebase

Fixes #11379
Related #8960

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Jan 27, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@QCLyu QCLyu marked this pull request as ready for review January 29, 2026 00:34
@QCLyu
Copy link
Contributor Author

QCLyu commented Jan 29, 2026

Hi could someone help review this PR? From Git Bot, the failed step was Node Copy file from S3, which is a CI infra issue: a 403 Forbidden from AWS S3 during a Jenkins pipeline step that downloads a file from S3.

A few import statements were deleted, bc they were only used in Spark 3.2 related code (removed in #8960)

@FelixYBW FelixYBW changed the title Qingchuanlyu Cleanup of Spark 3.2 code Jan 29, 2026
@FelixYBW FelixYBW changed the title Cleanup of Spark 3.2 code [CORE] Cleanup of Spark 3.2 code Jan 29, 2026
@PHILO-HE PHILO-HE self-requested a review January 29, 2026 02:31
@QCLyu
Copy link
Contributor Author

QCLyu commented Jan 29, 2026

Thanks @PHILO-HE Getting back to you later this week.

Copy link
Member

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your continued efforts.

Could you also help check the APIs declared in SparkShims? Maybe, some of them were introduced due to some code differentiation between Spark 3.2 and the later versions. If so, we can also do a cleanup. Thank you.

} else {
rowType0()
}
rowType0()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can do a more thorough cleanup. My understanding is, KnownRowTypeForSpark33OrLater was introduced for specially handling Spark 3.2. Now that Spark 3.2 has been deprecated, can we remove this trait and directly use KnownRowType instead?

private val comparedWithSpark35 = compareMajorMinorVersion((3, 5))
val eqSpark33: Boolean = comparedWithSpark33 == 0
val lteSpark33: Boolean = lteSpark32 || eqSpark33
val lteSpark33: Boolean = comparedWithSpark33 <= 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe, we can just remove this and use eqSpark33 on the caller side instead.

@PHILO-HE PHILO-HE changed the title [CORE] Cleanup of Spark 3.2 code [CORE] Remove legacy Spark 3.2 compatibility code Jan 30, 2026
@PHILO-HE
Copy link
Member

FYI. ColumnarArrayShim is refactored by #11525 whose main purpose is to reduce duplicate code. And this class for Spark 3.2 has been removed in that PR.

@QCLyu QCLyu marked this pull request as draft February 1, 2026 21:51
@github-actions
Copy link

github-actions bot commented Feb 1, 2026

Run Gluten Clickhouse CI on x86

@QCLyu QCLyu marked this pull request as ready for review February 1, 2026 23:23
@QCLyu
Copy link
Contributor Author

QCLyu commented Feb 1, 2026

Hi @PHILO-HE please check again. The CI failure was unrelated.

@zzcclp
Copy link
Contributor

zzcclp commented Feb 2, 2026

Run Gluten Clickhouse CI on x86

Copy link
Member

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some new minor comments. Please also rebase the code. Thanks.

override def rowType0(): Convention.RowType
override def rowType(): Convention.RowType = rowType0()

def rowType0(): Convention.RowType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?


def isPlannedV1Write(plan: DataWritingCommandExec): Boolean = {
if (SparkVersionUtil.lteSpark33) {
if (SparkVersionUtil.compareMajorMinorVersion((3, 3)) <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using eqSpark33 instead, since no need to consider earlier versions.

<configuration>
<!-- Ensure Scala compiles to the same output dir so Java can see Scala classes -->
<outputDirectory>${project.build.outputDirectory}</outputDirectory>
</configuration>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see some errors without these new code? I am wondering why we need this. If it is indeed necessary, can we move the plugin configuration into root pom for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PHILO-HE Yes. The offline build failures were the “cannot find symbol” errors in gluten-substrait's Java code (e.g. ConverterUtils, SubstraitContext, GlutenConfig). Those can also be caused by:
-Incremental/stale build (e.g. mvn clean compile fixing it)
-Scala and Java writing to different output dirs when a module overrides project.build.outputDirectory (e.g. target/scala-${scala.binary.version}/classes)

I'll move the configuration to the root POM and remove it from gluten-substrait.

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Run Gluten Clickhouse CI on x86

@QCLyu QCLyu marked this pull request as draft February 3, 2026 17:54
@PHILO-HE
Copy link
Member

@QCLyu, could you please spare some time to continue updating this PR? We would like to include it in 1.6 release. If you need help to identify issues, please let me know. Thanks.

@QCLyu
Copy link
Contributor Author

QCLyu commented Feb 11, 2026 via email

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@PHILO-HE
Copy link
Member

@QCLyu, it seems this PR should not change ArrowColumnarArray. Maybe, you need to keep its related code unchanged to pass CI.

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the DOCS label Feb 13, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

3 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the BUILD label Feb 13, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

5 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@QCLyu
Copy link
Contributor Author

QCLyu commented Feb 14, 2026

Hi @PHILO-HE , are you aware of any recent migration from java to scala regarding TreeMemoryConsumers? The challenge is I couldn't test compliers locally after making changes, and pushing blind commits is very inefficient. It always failed with sth similar to the following error:
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.9.2:compile (scala-compile-first) on project gluten-core: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.9.2:compile failed: Failed to find name hashes for org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumers

By switching to the main branch, the same error persisted. My guess is a migration from Scala to Java happened recently. Checked that TreeMemoryConsumers exists only as a .java file in the main branch (cc @zhztheplayer ).

What I have tried:

  • Standard Clean Build (mvn clean package)
  • Bypass Zinc Server (-Dscala.useZincServer=false)
  • Manual File Search (Checking for ghost .scala files)
  • Nuke and Pave (Manual rm -rf target/)
  • Forces the Scala compiler to look at Java source files in the same pass (added configuration in pom.xml)

So far all the above methods failed. How I generally test locally before pushing PR:
mvn clean package -Pbackends-velox -Pspark-3.5 -DskipTests
Or, simply mvn clean package -DskipTests
These local tests were generally useful before I took a break.

Would appreciate guidance or contexts.

@QCLyu
Copy link
Contributor Author

QCLyu commented Feb 15, 2026

Hi @PHILO-HE , are you aware of any recent migration from java to scala regarding TreeMemoryConsumers? The challenge is I couldn't test compliers locally after making changes, and pushing blind commits is very inefficient. It always failed with sth similar to the following error: [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:4.9.2:compile (scala-compile-first) on project gluten-core: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:4.9.2:compile failed: Failed to find name hashes for org.apache.gluten.memory.memtarget.spark.TreeMemoryConsumers

By switching to the main branch, the same error persisted. My guess is a migration from Scala to Java happened recently. Checked that TreeMemoryConsumers exists only as a .java file in the main branch (cc @zhztheplayer ).

What I have tried:

  • Standard Clean Build (mvn clean package)
  • Bypass Zinc Server (-Dscala.useZincServer=false)
  • Manual File Search (Checking for ghost .scala files)
  • Nuke and Pave (Manual rm -rf target/)
  • Forces the Scala compiler to look at Java source files in the same pass (added configuration in pom.xml)

So far all the above methods failed. How I generally test locally before pushing PR: mvn clean package -Pbackends-velox -Pspark-3.5 -DskipTests Or, simply mvn clean package -DskipTests These local tests were generally useful before I took a break.

Would appreciate guidance or contexts.

Created a separate issue #11616 . Please correct me if I'm wrong (or overthinking).

@QCLyu
Copy link
Contributor Author

QCLyu commented Feb 15, 2026

Will re-open.

@QCLyu
Copy link
Contributor Author

QCLyu commented Feb 15, 2026

Created a separate issue #11616 . Please correct me if I'm wrong (or overthinking).

Likely a local problem. Working on it. Closed the separate issue #11616 . This abandoned PR is still cited in the issue #11379 for reference purpose. I will create a separate PR to clean Spark 3.2 compatibility code for the sake of clean history. Target release 1.7 in May 2026.

cc @PHILO-HE @zhztheplayer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core INFRA VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove remaining Spark 3.2-specific compatibility code

3 participants