-
Notifications
You must be signed in to change notification settings - Fork 582
Description
Labels: enhancement, VELOX
Description
Enable the GlutenParquetTypeWideningSuite test suite for Spark 4.0 and 4.1, which validates Parquet type widening support (SPARK-40876).
Background
GlutenParquetTypeWideningSuite has 84 tests covering two types of Parquet type conversions:
- Physical→Logical type restoration: Reading
int32 + INT(8)asTINYINT(safe, writer guarantees value range) - Schema evolution widening: Reading old
IntegerTypedata asLongType,DoubleType, orDecimalType(Spark 4.0 feature)
Currently the suite is disabled with 74 out of 84 tests failing. The failures fall into four categories:
| Category | Count | Issue | Fix |
|---|---|---|---|
| A | 13 | Velox doesn't support INT→DOUBLE/REAL/DECIMAL widening | Velox C++ convertType() extension |
| B | 29 | Exception type mismatch + no Decimal precision check | Exception translation + C++ precision check |
| C | 31 | Parquet V2 encoding assertions + Decimal conversion limits | Disable native writer + test overrides + Velox C++ |
| D | 1 | parquet-mr only decimal narrowing overflow→null | Exclude (cannot reproduce with native reader) |
Plan
This will be addressed in 3 PRs:
-
PR 1 — Exception translation: Add
translateException()to convert Velox type errors toSchemaColumnConvertNotSupportedException. Enable the suite with appropriate excludes/overrides for tests that pass without C++ changes. -
PR 2 — SPARK-18108 + Revert OAP: Fix partition column type conflicts. Import upstream Velox PR #15173.
-
PR 3 — Type widening implementation: Velox C++ changes for INT→DOUBLE/REAL/DECIMAL and Decimal→Decimal widening. Requires upstream Velox PR first, then enable remaining tests.
Test Results (Target)
| Spark 4.0 | Spark 4.1 | |
|---|---|---|
| ✅ Passed | 46 | 46 |
| 🟢 Override (passed) | 35 | 35 |
| ❌ Excluded | 3 | 3 |
| Total | 84 | 84 |
Sub-issue of #11550.
This issue was written with the assistance of AI.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status