[CORE] Use Substrait timestamp_tz for Spark TimestampType to preserve timezone-aware semantics#11074
Conversation
|
Run Gluten Clickhouse CI on x86 |
a2b2c71 to
d7b5aad
Compare
|
Run Gluten Clickhouse CI on x86 |
d7b5aad to
71a8ffc
Compare
|
Run Gluten Clickhouse CI on x86 |
71a8ffc to
4f12dbe
Compare
|
Run Gluten Clickhouse CI on x86 |
4f12dbe to
6f00de5
Compare
|
Run Gluten Clickhouse CI on x86 |
…ne-aware semantics
6f00de5 to
0a4d758
Compare
|
Run Gluten Clickhouse CI on x86 |
|
@rui-mo @zhztheplayer Could you please help review this? @taiyang-li @lgbo-ustc @zzcclp could you confirm whether the changes here have any impact on the ClickHouse backend? |
|
Spark3.5 added TIMESTAMP_NTZ data type. @rui-mo does velox support it? have we enabled the UT for for? |
This PR only modified the Substrait mapping, reserving the corresponding mapping type for future support of timestamp_ntz, velox doesn't support it yet. |
|
We need a through clean up of timestamp and timezone support in Gluten sometime later. |
CH backend changes are ok to me. Sorry for the late reply. |
What changes are proposed in this pull request?
Spark’s TimestampType is timezone-aware: it internally stores timestamps in UTC (by converting input values to UTC based on the session time zone or just read UTC timestamp from parquet file) and represents an absolute point in time. This semantics aligns with Substrait’s timestamp_tz type, which also denotes a timezone-aware timestamp that can be unambiguously mapped to a moment on the timeline.
To maintain semantic consistency between Spark and Substrait, this PR maps Spark’s TimestampType to Substrait’s timestamp_tz.
https://substrait.io/types/type_classes
This approach is consistent with other projects—for example, Apache Iceberg also maps Spark’s TimestampType to TIMESTAMP WITH TIME ZONE and Spark’s TimestampNTZ to TIMESTAMP WITHOUT TIME ZONE.
Note: For future support of Spark’s TimestampNTZType (timezone-naive timestamps), we should map it to Substrait’s timestamp type instead.
How was this patch tested?
The existing tests already cover this change.