[GLUTEN-11683][VL] Add Parquet type widening support by baibaichen · Pull Request #11719 · apache/gluten

baibaichen · 2026-03-08T10:18:53Z

What changes were proposed in this pull request?

Add Parquet type widening support to Velox and enable 80 of 84 tests in GlutenParquetTypeWideningSuite.

Changes

Point Velox to type widening branch (get-velox.sh):
Use baibaichen/pr3/parquet-type-widening Velox branch with INT→Decimal, INT→Double, Float→Double widening support.
Update VeloxTestSettings (spark40 + spark41):
Remove 15 excludes for widening tests now passing.
Disable native writer (GlutenParquetTypeWideningSuite.scala):
This suite tests the READ path only. Disable native writer so Spark's writer produces correct V2 encodings (DELTA_BINARY_PACKED/DELTA_BYTE_ARRAY). Remove 10 more excludes.
Fallback to vanilla reader when vectorized=false (BasicScanExecTransformer.scala):
When PARQUET_VECTORIZED_READER_ENABLED=false, fallback to Spark's vanilla parquet-mr reader instead of Velox native reader. This preserves parquet-mr's behavior (decimal precision narrowing, null on overflow). Remove 34 more excludes.

Test Results

	PR2	PR3
✅ Passed	21	80 (+59)
❌ Excluded	63	4 (-59)

Remaining 4 excludes: Velox does not support DELTA_BYTE_ARRAY encoding for FIXED_LEN_BYTE_ARRAY decimals.

Depends on #11689 (PR2).
Fixes #11683

How was this patch tested?

Local tests: TypeWideningSuite 80 pass / 4 ignored (spark40 and spark41).

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

github-actions · 2026-03-08T10:19:20Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-03-09T03:05:44Z

Run Gluten Clickhouse CI on x86

Replace OAP commit [15173][15343] (INT narrowing) with upstream Velox PR #15173 (fix reading array of row) to fix parquet-thrift compatibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…olumns When Gluten creates HiveTableHandle, it was passing all columns (including partition columns) as dataColumns. This caused Velox's convertType() to validate partition column types against the Parquet file's physical types, failing when they differ (e.g., LongType in file vs IntegerType from partition inference). Fix: build dataColumns excluding partition columns (ColumnType::kPartitionKey). Partition column values come from the partition path, not from the file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With OAP INT narrowing commit replaced by upstream Velox PR #15173: - Remove 2 excludes now passing: LongType->IntegerType, LongType->DateType - Add 2 excludes for new failures: IntegerType->ShortType (OAP removed) Exclude 63 (net unchanged: -2 +2). Test results: 21 pass / 63 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

These tests regress after skipping OAP commit 8c2bd0849 (Allow reading integers into smaller-range types). They will be re-enabled in PR3 when Velox widening commits are applied. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

With Velox PR3 type widening (INT->Decimal, INT->Double, Float->Double): - Remove 15 excludes for widening tests now passing Remaining 48 excludes. Test results: 36 pass / 48 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This suite tests the READ path only. Disable native writer so Spark's writer produces correct V2 encodings (DELTA_BINARY_PACKED/DELTA_BYTE_ARRAY). - Remove 10 excludes for decimal widening tests now passing Remaining 38 excludes: - 34: Velox native reader rejects incompatible decimal conversions regardless of reader config (no parquet-mr fallback) - 4: Velox does not support DELTA_BYTE_ARRAY encoding Test results: 46 pass / 38 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When PARQUET_VECTORIZED_READER_ENABLED=false, fallback to Spark's vanilla parquet-mr reader instead of using Velox native reader. This preserves parquet-mr's behavior (e.g., allowing decimal precision narrowing, returning null on overflow) which differs from the vectorized reader. - Remove 34 excludes from GlutenParquetTypeWideningSuite that now pass via vanilla reader fallback Remaining 4 excludes: Velox does not support DELTA_BYTE_ARRAY encoding. Test results: 80 pass / 4 ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions bot added CORE works for Gluten Core BUILD VELOX labels Mar 8, 2026

baibaichen and others added 3 commits March 10, 2026 05:34

Point Velox to PR2 branch with parquet-thrift fix

486630b

Replace OAP commit [15173][15343] (INT narrowing) with upstream Velox PR #15173 (fix reading array of row) to fix parquet-thrift compatibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

baibaichen force-pushed the pr3/parquet-type-widening branch from e3c259f to 4f80267 Compare March 10, 2026 07:37

baibaichen force-pushed the pr3/parquet-type-widening branch from 4f80267 to abbc057 Compare March 10, 2026 09:44

baibaichen and others added 4 commits March 10, 2026 14:34

Point Velox to PR3 branch with parquet type widening support

b27f7a0

baibaichen force-pushed the pr3/parquet-type-widening branch from abbc057 to c2d50e1 Compare March 10, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-11683][VL] Add Parquet type widening support#11719

[GLUTEN-11683][VL] Add Parquet type widening support#11719
baibaichen wants to merge 8 commits intoapache:mainfrom
baibaichen:pr3/parquet-type-widening

baibaichen commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baibaichen commented Mar 8, 2026

What changes were proposed in this pull request?

Changes

Test Results

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant