Skip to content

[CORE] Use Substrait timestamp_tz for Spark TimestampType to preserve timezone-aware semantics#11074

Merged
FelixYBW merged 1 commit intoapache:mainfrom
liujiayi771:timestamp_tz
Nov 15, 2025
Merged

[CORE] Use Substrait timestamp_tz for Spark TimestampType to preserve timezone-aware semantics#11074
FelixYBW merged 1 commit intoapache:mainfrom
liujiayi771:timestamp_tz

Conversation

@liujiayi771
Copy link
Contributor

What changes are proposed in this pull request?

Spark’s TimestampType is timezone-aware: it internally stores timestamps in UTC (by converting input values to UTC based on the session time zone or just read UTC timestamp from parquet file) and represents an absolute point in time. This semantics aligns with Substrait’s timestamp_tz type, which also denotes a timezone-aware timestamp that can be unambiguously mapped to a moment on the timeline.

To maintain semantic consistency between Spark and Substrait, this PR maps Spark’s TimestampType to Substrait’s timestamp_tz.

https://substrait.io/types/type_classes

This approach is consistent with other projects—for example, Apache Iceberg also maps Spark’s TimestampType to TIMESTAMP WITH TIME ZONE and Spark’s TimestampNTZ to TIMESTAMP WITHOUT TIME ZONE.

Note: For future support of Spark’s TimestampNTZType (timezone-naive timestamps), we should map it to Substrait’s timestamp type instead.

How was this patch tested?

The existing tests already cover this change.

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Nov 13, 2025
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@liujiayi771 liujiayi771 marked this pull request as draft November 13, 2025 03:03
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@liujiayi771 liujiayi771 marked this pull request as ready for review November 13, 2025 14:19
@liujiayi771
Copy link
Contributor Author

@rui-mo @zhztheplayer Could you please help review this? @taiyang-li @lgbo-ustc @zzcclp could you confirm whether the changes here have any impact on the ClickHouse backend?

@liujiayi771 liujiayi771 requested a review from rui-mo November 14, 2025 01:41
Copy link
Contributor

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@FelixYBW
Copy link
Contributor

Spark3.5 added TIMESTAMP_NTZ data type. @rui-mo does velox support it? have we enabled the UT for for?

@FelixYBW FelixYBW merged commit 8399ac7 into apache:main Nov 15, 2025
60 checks passed
@liujiayi771
Copy link
Contributor Author

liujiayi771 commented Nov 16, 2025

Spark3.5 added TIMESTAMP_NTZ data type. @rui-mo does velox support it? have we enabled the UT for for?

This PR only modified the Substrait mapping, reserving the corresponding mapping type for future support of timestamp_ntz, velox doesn't support it yet.

@FelixYBW
Copy link
Contributor

We need a through clean up of timestamp and timezone support in Gluten sometime later.

@taiyang-li
Copy link
Contributor

taiyang-li commented Nov 19, 2025

@rui-mo @zhztheplayer Could you please help review this? @taiyang-li @lgbo-ustc @zzcclp could you confirm whether the changes here have any impact on the ClickHouse backend?

CH backend changes are ok to me. Sorry for the late reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants