-
Notifications
You must be signed in to change notification settings - Fork 582
Description
Description
In this issue, we will list and track all tasks for ANSI mode support.
There are two Spark configurations directly related to ANSI usage.
spark.sql.ansi.enabled(default is true since Spark 4.0)spark.sql.storeAssignmentPolicy(default is ANSI since Spark 3.0)
Tasks
Basic
1. Type Casting Functions (ANSI Strict)
-
cast string to boolean (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L701 -
Cast decimal to string (@Mariamalmesfer))
// In ANSI mode, Spark always use plain string representation on casting Decimal values
// as strings. Otherwise, the casting is usingBigDecimal.toStringwhich may use scientific
// notation if an exponent is needed.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L678 -
cast string to timestamp (@infvg )
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L733 -
cast String to timestampNTZ
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L775 -
cast float/double to timestamp (@infvg)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L758
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L765 -
cast string to date (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L811 -
cast string to time (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L826
The implementation for codegen, assume equivalent with the above link:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1493 -
cast string to long/int/short/byte (@malinjawi)
As one example, here is the related code for long type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L883 -
cast NumericType to long/int/short/byte (@minni31)
As one example, here is the related code for long type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L896 -
cast timestamp to int/short/byte
As one example, here is the related code for int type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L933 -
cast time to short/byte (requires TimeType support)
As one example, here is the related code for short type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L980 -
cast several types to decimal
ANSI controls the overflow behavior in changePrecision
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1103 -
cast string to double/float
ANSI controls the behavior in handling incorrect number format
As one example, here is the related code for double type:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L1159
2. Arithmetic Functions (ANSI Overflow Check)
-
A base type: AnsiIntervalType (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/api/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala#L168 -
Unary expressions like Abs, UnaryMinus (@malinjawi)
The ANSI config controls failOnError.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L152C35-L152C46 -
Binary arithmetic expressions using BinaryArithmetic as base, such as add, divide, multiply, etc. (@malinjawi)
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L209
3. Date/Time Functions (ANSI Validation)
- Datetime expressions: ToUnixTimestamp, UnixTimestamp, GetTimestamp, TryToTimestampExpressionBuilder, NextDay, DateAddInterval, ParseToDate, TryToDateExpressionBuilder, ParseToTimestamp, MakeDate, TryMakeTimestampLTZExpressionBuilder, MakeTimestamp
4. String to Numeric Conversion (ANSI Strict)
-
String expressions: Elt
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L286 -
Collection expression Size
Its legacySizeOfNull is impacted by ANSI config.
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L118 -
Collection expression ElementAt
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L2622 -
round functions: Round, BRound, RoundCeil, RoundFloor
As one example, see how to round to ByteType with ANSI enabled:
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1579 -
STORE_ASSIGNMENT_POLICY defaults to ANSI
https://github.com/apache/spark/blob/e221b56be7b6d9e48e107fc4d1cf0c15f02700f8/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L4487
5. Aggregation Functions (ANSI Overflow)
SUM, AVG, VAR_POP, VAR_SAMP, STDDEV_POP, STDDEV_SAMP
- In ANSI mode: overflow checks during accumulation.
TRY_SUM (Spark 3.4+)
- Returns
NULLon overflow instead of error.
6. Window Functions (ANSI Overflow)
Same overflow checks apply in window operations:
SUM(...) OVER(...)AVG(...) OVER(...)
7. ANSI SQL Compliant String Functions
SUBSTRING / SUBSTR
- ANSI SQL standard argument order:
SUBSTRING(str FROM start [FOR len]) - Also supports classic form:
SUBSTRING(str, start, len)
TRIM
- ANSI syntax:
TRIM(LEADING '0' FROM col)
AlsoTRIM(BOTH ...),TRIM(TRAILING ...)
OVERLAY
- ANSI SQL string replacement:
OVERLAY(string PLACING replacement FROM start [FOR length])
See Spark ANSI compliance: https://github.com/apache/spark/blob/v4.0.0/docs/sql-ref-ansi-compliance.md
Related discussion: #4740.
facebookincubator/velox#3869
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status