Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ object QueryPlanSerde extends Logging with CometExprShim {
classOf[Log] -> CometLog,
classOf[Log2] -> CometLog2,
classOf[Log10] -> CometLog10,
classOf[Logarithm] -> CometLogarithm,
classOf[Multiply] -> CometMultiply,
classOf[Pow] -> CometScalarFunction("pow"),
classOf[Rand] -> CometRand,
Expand Down
16 changes: 15 additions & 1 deletion spark/src/main/scala/org/apache/comet/serde/math.scala
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

package org.apache.comet.serde

import org.apache.spark.sql.catalyst.expressions.{Abs, Atan2, Attribute, Ceil, CheckOverflow, Expression, Floor, Hex, If, LessThanOrEqual, Literal, Log, Log10, Log2, Tan, Unhex}
import org.apache.spark.sql.catalyst.expressions.{Abs, Atan2, Attribute, Ceil, CheckOverflow, Expression, Floor, Hex, If, LessThanOrEqual, Literal, Log, Log10, Log2, Logarithm, Tan, Unhex}
import org.apache.spark.sql.types.{DecimalType, NumericType}

import org.apache.comet.CometSparkSessionExtensions.withInfo
Expand Down Expand Up @@ -138,6 +138,20 @@ object CometLog2 extends CometExpressionSerde[Log2] with MathExprBase {
}
}

object CometLogarithm extends CometExpressionSerde[Logarithm] with MathExprBase {
override def convert(
expr: Logarithm,
inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr] = {
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says Spark returns null when the result is NaN, but for log functions Spark/Hive semantics are generally “non-positive inputs => null” (e.g., log(..., 0) would otherwise yield -Infinity, not NaN). This wording could mislead future changes to the null-guard logic.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:documentation; feedback: The Augment AI reviewer is correct! The comment says that NaN results are converted to NULL but value=0 leads to an -Infinity result, which is also treated as NULL by Apache Spark. Prevents committing a confusing comment.

// which happens when base <= 0 or value <= 0. Apply nullIfNegative to both.
val leftExpr = exprToProtoInternal(nullIfNegative(expr.left), inputs, binding)
val rightExpr = exprToProtoInternal(nullIfNegative(expr.right), inputs, binding)
Comment on lines +146 to +149
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of CometLogarithm doesn't correctly handle the case where the logarithm base is 1. Spark's Logarithm expression returns null when the base is 1. This implementation is missing that check, which can lead to incorrect results (e.g., Infinity or NaN) from the native kernel instead of null.

To align with Spark's behavior, you should add a check to return null if expr.left is 1. I've also updated the comment to reflect this.

Suggested change
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
// which happens when base <= 0 or value <= 0. Apply nullIfNegative to both.
val leftExpr = exprToProtoInternal(nullIfNegative(expr.left), inputs, binding)
val rightExpr = exprToProtoInternal(nullIfNegative(expr.right), inputs, binding)
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
// which happens when base <= 0, base = 1, or value <= 0.
val baseExpr = If(expr.left === Literal(1.0), Literal(null, expr.left.dataType), nullIfNegative(expr.left))
val leftExpr = exprToProtoInternal(baseExpr, inputs, binding)
val rightExpr = exprToProtoInternal(nullIfNegative(expr.right), inputs, binding)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct!

spark-sql (default)> SELECT log(1, 5);
Infinity
Time taken: 0.087 seconds, Fetched 1 row(s)

It returns Infinity, not Null when the base is 1.

val optExpr = scalarFunctionExprToProto("log", leftExpr, rightExpr)
optExprWithInfo(optExpr, expr, expr.left, expr.right)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CometLogarithm mishandles NaN inputs producing NaN instead of null

High Severity

CometLogarithm uses nullIfNegative on both inputs, which only converts values <= 0 to null. But Spark's Logarithm applies a nullCheck on the result, returning null whenever the output isNaN. This means NaN inputs (where NaN <= 0 is false) pass through nullIfNegative unchanged, and DataFusion then produces NaN where Spark would produce null. The existing log.sql test includes NaN in its test data (cast('NaN' as double)), so queries like log(10.0, NaN) will return NaN from Comet but null from Spark, causing a test failure and incorrect production results. The same mismatch occurs for log(1, 1) where both inputs are valid but the result is 0/0 = NaN.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Bugbot AI reviewer is not correct!

spark-sql (default)> SELECT log(1, 1);
NaN
Time taken: 0.072 seconds, Fetched 1 row(s)

Apache Spark 4.0.2 returns NaN here, not NULL.


object CometHex extends CometExpressionSerde[Hex] with MathExprBase {
override def convert(
expr: Hex,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
-- under the License.

-- ConfigMatrix: parquet.enable.dictionary=false,true
-- Config: spark.comet.expression.Tan.allowIncompatible=true

statement
CREATE TABLE test_tan(d double) USING parquet
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ class CometSqlFileTestSuite extends CometTestBase with AdaptiveSparkPlanHelper {
case SparkAnswerOnly =>
checkSparkAnswer(sql)
case WithTolerance(tol) =>
checkSparkAnswerWithTolerance(sql, tol)
checkSparkAnswerAndOperatorWithTolerance(sql, tol)
case ExpectFallback(reason) =>
checkSparkAnswerAndFallbackReason(sql, reason)
case Ignore(reason) =>
Expand Down
11 changes: 11 additions & 0 deletions spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,17 @@ abstract class CometTestBase
internalCheckSparkAnswer(df, assertCometNative = false, withTol = Some(absTol))
}

/**
* Check that the query returns the correct results when Comet is enabled and that Comet
* replaced all possible operators. Use the provided `tol` when comparing floating-point
* results.
*/
protected def checkSparkAnswerAndOperatorWithTolerance(
query: String,
absTol: Double = 1e-6): (SparkPlan, SparkPlan) = {
internalCheckSparkAnswer(sql(query), assertCometNative = true, withTol = Some(absTol))
}

/**
* Check that the query returns the correct results when Comet is enabled and that Comet
* replaced all possible operators except for those specified in the excluded list.
Expand Down
Loading