Skip to content

[VL] Adding configurations on max write file size#11606

Open
zhouyuan wants to merge 1 commit intoapache:mainfrom
zhouyuan:wip_config_writer
Open

[VL] Adding configurations on max write file size#11606
zhouyuan wants to merge 1 commit intoapache:mainfrom
zhouyuan:wip_config_writer

Conversation

@zhouyuan
Copy link
Member

What changes are proposed in this pull request?

Adding config for max write file size in Velox

How was this patch tested?

pass GHA
Velox UT

Was this patch authored or co-authored using generative AI tooling?

Signed-off-by: Yuan <yuanzhou@apache.org>
.createWithDefault(10000)

val MAX_TARGET_FILE_SIZE_SESSION =
buildConf("spark.gluten.sql.columnar.backend.velox.maxTargetFileSizeSession")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does Session mean here?

| spark.gluten.sql.columnar.backend.velox.maxSpillFileSize | 1GB | The maximum size of a single spill file created |
| spark.gluten.sql.columnar.backend.velox.maxSpillLevel | 4 | The max allowed spilling level with zero being the initial spilling level |
| spark.gluten.sql.columnar.backend.velox.maxSpillRunRows | 3M | The maximum row size of a single spill run |
| spark.gluten.sql.columnar.backend.velox.maxTargetFileSizeSession | 0b | The target file size for each output file when writing data. 0 means no limit on target file size, and the actual file size will be determined by other factors such as max partition number and shuffle batch size. |
Copy link
Contributor

@FelixYBW FelixYBW Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it map to iceberg's write.target-file-size-bytes? and honor spark.sql.iceberg.advisory-partition-size? If so let's honor this config in Gluten as well.

If it only take effect on iceberg, we may just reuse iceberg's config instead of a new config.

.createWithDefault(10000)

val MAX_TARGET_FILE_SIZE_SESSION =
buildConf("spark.gluten.sql.columnar.backend.velox.maxTargetFileSizeSession")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove Session suffix, this is Velox code config type suffix, not the config itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants