Skip to content

[VL] Support type widening in Parquet reader (SPARK-40876) #11683

@baibaichen

Description

@baibaichen

Labels: enhancement, VELOX


Description

Enable the GlutenParquetTypeWideningSuite test suite for Spark 4.0 and 4.1, which validates Parquet type widening support (SPARK-40876).

Background

GlutenParquetTypeWideningSuite has 84 tests covering two types of Parquet type conversions:

  1. Physical→Logical type restoration: Reading int32 + INT(8) as TINYINT (safe, writer guarantees value range)
  2. Schema evolution widening: Reading old IntegerType data as LongType, DoubleType, or DecimalType (Spark 4.0 feature)

Currently the suite is disabled with 74 out of 84 tests failing. The failures fall into four categories:

Category Count Issue Fix
A 13 Velox doesn't support INT→DOUBLE/REAL/DECIMAL widening Velox C++ convertType() extension
B 29 Exception type mismatch + no Decimal precision check Exception translation + C++ precision check
C 31 Parquet V2 encoding assertions + Decimal conversion limits Disable native writer + test overrides + Velox C++
D 1 parquet-mr only decimal narrowing overflow→null Exclude (cannot reproduce with native reader)

Plan

This will be addressed in 3 PRs:

  1. PR 1 — Exception translation: Add translateException() to convert Velox type errors to SchemaColumnConvertNotSupportedException. Enable the suite with appropriate excludes/overrides for tests that pass without C++ changes.

  2. PR 2 — SPARK-18108 + Revert OAP: Fix partition column type conflicts. Import upstream Velox PR #15173.

  3. PR 3 — Type widening implementation: Velox C++ changes for INT→DOUBLE/REAL/DECIMAL and Decimal→Decimal widening. Requires upstream Velox PR first, then enable remaining tests.

Test Results (Target)

Spark 4.0 Spark 4.1
✅ Passed 46 46
🟢 Override (passed) 35 35
❌ Excluded 3 3
Total 84 84

Sub-issue of #11550.

This issue was written with the assistance of AI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions