Skip to content

[GLUTEN-10648][VL] Support Iceberg overwrite partitions dynamic#10823

Merged
jinchengchenghh merged 2 commits intoapache:mainfrom
Zouxxyy:dev/iceberg-dp2
Oct 1, 2025
Merged

[GLUTEN-10648][VL] Support Iceberg overwrite partitions dynamic#10823
jinchengchenghh merged 2 commits intoapache:mainfrom
Zouxxyy:dev/iceberg-dp2

Conversation

@Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Sep 30, 2025

What changes are proposed in this pull request?

Support Iceberg overwrite partitions dynamic

How was this patch tested?

@github-actions github-actions bot added CORE works for Gluten Core VELOX DOCS DATA_LAKE labels Sep 30, 2025
@github-actions
Copy link

#10648

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@jinchengchenghh
Copy link
Contributor

It is public, username gluten password hN2xX3uQ4m

@jinchengchenghh
Copy link
Contributor

This is an unstable test, I will create a PR to fix it.

[2025-09-30T11:51:08.252Z] - Gluten - SPARK-35650: Coalesce number of partitions by AEQ *** FAILED ***
[2025-09-30T11:51:08.252Z]   2 did not equal 1 (ClickHouseAdaptiveQueryExecSuite.scala:84)

@jinchengchenghh
Copy link
Contributor

The line number is different with source ClickHouseAdaptiveQueryExecSuite.scala:84

override def supportOverwriteByExpression(): Boolean =
GlutenConfig.get.enableOverwriteByExpression && enableEnhancedFeatures()

override def supportOverwritePartitionsDynamic(): Boolean =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the backend setting config, just return enableEnhancedFeatures(), and this change can trigger the CI again.

@github-actions
Copy link

Run Gluten Clickhouse CI on x86


def supportOverwriteByExpression(): Boolean = false

def supportOverwritePartitionsDynamic(): Boolean = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessarily related to this PR, but we do need to have the new V2 columnar write operators covered in tests individually without having to enable the Iceberg writer, as they were design to be general. Vanilla Spark uses an in-memory catalog for testing the row-based V2 write operators. We may want to introduce something similar just for testing. #9896

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, v2 write is a general capability that can be used in all other lake formats. I know a bit about DSv2, and happy to help if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Feel free to open issues and PRs.

Copy link
Member

@zhztheplayer zhztheplayer Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark uses an in-memory catalog for testing the row-based V2 write operators. We may want to introduce something similar just for testing.

@Zouxxyy Just recalled that other contributors might already work on something similar, let me confirm first to avoid duplicated work. :) I don't have their GitHub ID or Email at this moment but I will try to get them into the public discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed, they are not working on the test topic. So feel free to take if wanted. We may have public discussions about the further matters later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'd like to, testing is the foundation, which will make the integration more directional and reliable.

@jinchengchenghh jinchengchenghh merged commit 971f590 into apache:main Oct 1, 2025
57 checks passed
.createWithDefault(true)

val COLUMNAR_OVERWRIET_PARTITIONS_DYNAMIC_ENABLED =
buildConf("spark.gluten.sql.columnar.overwriteOverwritePartitionsDynamic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Zouxxyy,

Should this be spark.gluten.sql.columnar.overwritePartitionsDynamic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my mistake, I might have copied the wrong content.

zhztheplayer added a commit to zhztheplayer/gluten that referenced this pull request Oct 1, 2025
This should fix CI error on `AllVeloxConfiguration`.
zhztheplayer added a commit that referenced this pull request Oct 2, 2025
This should fix CI error on `AllVeloxConfiguration`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants