WIP: Enable GlutenParquetTypeWideningSuite for Spark 4.0 and 4.1 by baibaichen · Pull Request #11670 · apache/gluten

baibaichen · 2026-02-27T16:57:09Z

What changes were proposed in this pull request?

WIP: Enable GlutenParquetTypeWideningSuite for Spark 4.0 and 4.1

This PR enables the previously disabled GlutenParquetTypeWideningSuite test suite, which validates Parquet type widening support (SPARK-40876) when using Gluten's Velox native reader.

Background

Parquet reading involves two types of type conversions:

Physical→Logical restoration: Parquet uses wide physical containers (INT32, INT64, etc.) with logical annotations. Reading int32 + INT(8) as TINYINT is safe — the writer guarantees values fit within the annotated range.
Schema evolution widening: Reading old data with a wider type (e.g., IntegerType → DoubleType, Decimal(5,2) → Decimal(7,2)). This is engine-specific — SPARK-40876 introduced this in Spark 4.0.

The original Velox Parquet reader (following Presto's behavior) did not support schema evolution widening for integer→float/double/decimal or decimal precision/scale widening, causing 74 out of 84 tests to fail.

Changes

Velox C++ fixes (in baibaichen/velox feature/enable-parquet-type-widening-suite):

Commit	Description
1. Revert OAP `16732b4f5`	Restore strict `convertType()` type checks that were over-relaxed (allowed INT64→INTEGER narrowing, commented out UTF8/ENUM validation)
2. INT type widening + precision check	Extend `convertType()` to allow REAL/DOUBLE/Decimal for INT_8/16/32/64. Add `hasEnoughDecimalPrecision` matching Spark's rule (INT32: p-s≥10, INT64: p-s≥20). Add DOUBLE/REAL cases to `getIntValues()` with decimal scale adjustment.
3. Decimal→Decimal widening	Fix same-scale precision widening (skip double-scaling in `IntegerColumnReader`). Support scale rescaling with `precisionIncrease ≥ scaleIncrease` rule.
4. SPARK-16632 fix	Allow reading INT32 as ByteType/ShortType in INT_16/INT_32/Physical INT32 cases

Gluten changes (this PR):

Commit	Description
1. Exception translation	Add `translateException()` in `ClosableIterator` + `ColumnarBatchOutIterator` to convert Velox type errors to `SchemaColumnConvertNotSupportedException`
2. Enable TypeWideningSuite	Enable suite in VeloxTestSettings (spark40+41), override GlutenParquetTypeWideningSuite to disable native writer and set `expectError=true` for tests where Velox correctly rejects unsupported conversions

Test Results

Status	Count	Details
Pass	81	10 original + 13 INT widening + 10 Decimal widening + 2 SPARK-16632 + 11 error-path + 35 overrides
Ignored	38	35 overrides (actually passing with different error assertion) + 3 truly excluded
Fail	0

The 3 truly excluded tests:

2× DELTA_BYTE_ARRAY encoding: Velox doesn't support this encoding for FIXED_LEN_BYTE_ARRAY (orthogonal to type widening)
1× parquet-mr decimal narrowing overflow→null: Cannot reproduce with Velox native reader

How was this patch tested?

Ran GlutenParquetTypeWideningSuite locally for Spark 4.0, achieving 81 pass / 0 fail / 38 ignored.

github-actions · 2026-02-27T16:57:37Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-02-28T03:53:42Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-02-28T11:41:24Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-02-28T13:36:42Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-02-28T14:10:36Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-03-01T15:43:01Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-03-02T00:01:12Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-03-02T02:53:12Z

Run Gluten Clickhouse CI on x86

…rtedException Add exception translation in Gluten's iterator chain so that Velox native reader type conversion errors are properly translated to Spark's expected SchemaColumnConvertNotSupportedException. Changes: - ClosableIterator.java: Extract translateException() virtual method (default returns GlutenException, preserving existing behavior) - ColumnarBatchOutIterator.java: Override translateException() to detect Velox type mapping errors ('not allowed for requested type' or 'Not a valid type for') and wrap them as SchemaColumnConvertNotSupportedException This enables Spark's ParquetTypeWideningSuite error-path tests to pass when using Gluten's Velox native reader. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Enable the previously disabled GlutenParquetTypeWideningSuite with Velox backend fixes for Parquet type widening (SPARK-40876). Test suite: 81 pass, 0 fail, 38 ignored (from 74 failures) Changes: - VeloxTestSettings.scala (spark40+41): Enable suite with targeted excludes for DELTA_BYTE_ARRAY encoding limitation (2) and parquet-mr overflow (1) - GlutenParquetTypeWideningSuite.scala (spark40+41): Override test class to disable native writer (test read-path only) and override 35 tests that need expectError=true for both reader configs (Velox always uses native reader regardless of vectorized setting) - get-velox.sh: Point to Velox branch with type widening support Velox fixes (in baibaichen/velox feature/enable-parquet-type-widening-suite): 1. Revert OAP commit that over-relaxed convertType() type checks 2. Support INT->DOUBLE/REAL/DECIMAL widening + decimal precision check 3. Support Decimal->Decimal widening (same-scale + scale rescaling) 4. Fix SPARK-16632: Allow reading INT32 as ByteType/ShortType Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…widening overrides Spark 4.1 adds 6 new tests for decimal precision+scale widening where precisionIncrease >= scaleIncrease >= 0. Velox already supports these conversions, so they should NOT be in the 'expect error' override list. Remove these 6 cases from the spark41 override: - Decimal(5,2) -> Decimal(7,4) - Decimal(5,2) -> Decimal(10,7) - Decimal(5,2) -> Decimal(20,17) - Decimal(10,2) -> Decimal(12,4) - Decimal(10,2) -> Decimal(20,12) - Decimal(20,2) -> Decimal(22,4) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove 5 excludes for Decimal->Decimal same-scale precision widening tests that are now supported by Velox commit 3. These tests were previously excluded with comment 'Velox reads wrong data' but the Decimal->Decimal widening fix resolved the issue. Un-excluded tests: - Decimal(5,2) -> Decimal(7,2) - Decimal(5,2) -> Decimal(10,2) - Decimal(5,2) -> Decimal(20,2) - Decimal(10,2) -> Decimal(12,2) - Decimal(10,2) -> Decimal(20,2) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…olumns When Gluten creates HiveTableHandle, it was passing all columns (including partition columns) as dataColumns. This caused Velox's convertType() to validate partition column types against the Parquet file's physical types, failing when they differ (e.g., LongType in file vs IntegerType from partition inference). Fix: build dataColumns excluding partition columns (ColumnType::kPartitionKey). Partition column values come from the partition path, not from the file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-03-02T06:08:24Z

Run Gluten Clickhouse CI on x86

github-actions bot added CORE works for Gluten Core BUILD VELOX labels Feb 27, 2026

baibaichen force-pushed the feature/enable-parquet-type-widening-suite branch from 4e3ce3c to ab1d6ad Compare February 28, 2026 11:40

baibaichen force-pushed the feature/enable-parquet-type-widening-suite branch from ab1d6ad to ee0c919 Compare February 28, 2026 13:36

baibaichen force-pushed the feature/enable-parquet-type-widening-suite branch from ee0c919 to 9e79ca7 Compare February 28, 2026 14:10

baibaichen force-pushed the feature/enable-parquet-type-widening-suite branch from 83b9bb6 to 7250a63 Compare March 2, 2026 02:52

baibaichen and others added 5 commits March 2, 2026 06:07

baibaichen force-pushed the feature/enable-parquet-type-widening-suite branch from 7250a63 to 5d22ba0 Compare March 2, 2026 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Enable GlutenParquetTypeWideningSuite for Spark 4.0 and 4.1#11670

WIP: Enable GlutenParquetTypeWideningSuite for Spark 4.0 and 4.1#11670
baibaichen wants to merge 5 commits intoapache:mainfrom
baibaichen:feature/enable-parquet-type-widening-suite

baibaichen commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Mar 1, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

baibaichen commented Feb 27, 2026

What changes were proposed in this pull request?

Background

Changes

Test Results

How was this patch tested?

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

github-actions bot commented Mar 1, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant