Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions spark/src/main/scala/org/apache/comet/serde/strings.scala
Original file line number Diff line number Diff line change
Expand Up @@ -408,4 +408,50 @@ trait CommonStringExprs {
None
}
}

def secondOfTimeToProto(
expr: Expression,
inputs: Seq[Attribute],
binding: Boolean): Option[Expr] = {
val childOpt = expr.children.headOption.orElse {
withInfo(expr, "SecondOfTime has no child expression")
None
}

childOpt.flatMap { child =>
val timeZoneId = {
val exprClass = expr.getClass
try {
val timeZoneIdMethod = exprClass.getMethod("timeZoneId")
timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reflective timeZoneId lookup only handles missing method/field; invoke/access/cast can still throw (e.g., IllegalAccessException, InvocationTargetException, ClassCastException) and crash serde/planning. Consider catching NonFatal and falling back to None (with withInfo) instead of letting it propagate.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The SecondsOfTime class has neither timeZoneId method nor a field with that name, so this reflection based lookups always fail. The timezone lookups should be removed and UTC should be used if a timezone is really needed.

} catch {
case _: NoSuchMethodException =>
try {
val timeZoneIdField = exprClass.getField("timeZoneId")
timeZoneIdField.get(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchFieldException | _: SecurityException => None
}
}
}
Comment on lines +422 to +436
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The reflection logic to get timeZoneId can be improved for correctness and readability.

  1. Correctness: The current code doesn't handle a potential SecurityException from exprClass.getMethod("timeZoneId"). This could lead to an unhandled exception during query planning.
  2. Readability: The nested try-catch block is hard to follow.

I suggest refactoring this part to un-nest the try-catch blocks and handle SecurityException for both reflection calls. This makes the code more robust and easier to understand.

Suggested change
val timeZoneId = {
val exprClass = expr.getClass
try {
val timeZoneIdMethod = exprClass.getMethod("timeZoneId")
timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchMethodException =>
try {
val timeZoneIdField = exprClass.getField("timeZoneId")
timeZoneIdField.get(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchFieldException | _: SecurityException => None
}
}
}
val timeZoneId = {
val exprClass = expr.getClass
val fromMethod = try {
exprClass.getMethod("timeZoneId").invoke(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchMethodException | _: SecurityException => None
}
fromMethod.orElse {
try {
exprClass.getField("timeZoneId").get(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchFieldException | _: SecurityException => None
}
}
}


exprToProtoInternal(child, inputs, binding)
.map { childExpr =>
val builder = ExprOuterClass.Second.newBuilder()
builder.setChild(childExpr)

val timeZone = timeZoneId.getOrElse("UTC")
builder.setTimezone(timeZone)

ExprOuterClass.Expr
.newBuilder()
.setSecond(builder)
.build()
}
.orElse {
withInfo(expr, child)
None
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datetime function misplaced in string expressions trait

Low Severity

secondOfTimeToProto is a datetime expression handler added to CommonStringExprs trait in strings.scala. The codebase already has a dedicated datetime.scala file containing nearly identical handlers for Second, Hour, and Minute (e.g. CometSecond). Placing datetime logic in a string expressions trait breaks the established code organization, making the method harder to discover and maintain alongside its closely related peers.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Bugbot AI reviewer is correct! The logic is Time-related, so it should be added to datetime.rs module instead of strings.rs

}
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ package org.apache.comet.shims

import org.apache.spark.sql.catalyst.expressions._

import org.apache.comet.CometSparkSessionExtensions.withInfo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new imports (withInfo, exprToProtoInternal) appear unused in this shim; under the strict-warnings profile (-Ywarn-unused:imports + -Xfatal-warnings) this can fail the build.

Severity: medium

Other Locations
  • spark/src/main/spark-3.4/org/apache/comet/shims/CometExprShim.scala:28

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! The unused imports will lead to warnings (or errors if the project's CI is configured to report warnings as errors). The unused imports should be removed

import org.apache.comet.expressions.CometEvalMode
import org.apache.comet.serde.CommonStringExprs
import org.apache.comet.serde.ExprOuterClass.{BinaryOutputStyle, Expr}
import org.apache.comet.serde.QueryPlanSerde.exprToProtoInternal

/**
* `CometExprShim` acts as a shim for parsing expressions from different Spark versions.
Expand All @@ -43,6 +45,9 @@ trait CometExprShim extends CommonStringExprs {
// Right child is the encoding expression.
stringDecode(expr, s.charset, s.bin, inputs, binding)

case _ if expr.getClass.getSimpleName == "SecondOfTime" =>
secondOfTimeToProto(expr, inputs, binding)

case _ => None
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ trait CometExprShim extends CommonStringExprs {
// val optExpr = scalarFunctionExprToProto("width_bucket", childExprs: _*)
// optExprWithInfo(optExpr, wb, wb.children: _*)

case _ if expr.getClass.getSimpleName == "SecondOfTime" =>
secondOfTimeToProto(expr, inputs, binding)

case _ => None
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,9 @@ trait CometExprShim extends CommonStringExprs {
// val optExpr = scalarFunctionExprToProto("width_bucket", childExprs: _*)
// optExprWithInfo(optExpr, wb, wb.children: _*)

case _ if expr.getClass.getSimpleName == "SecondOfTime" =>
secondOfTimeToProto(expr, inputs, binding)

case _ => None
}
}
Expand Down
17 changes: 17 additions & 0 deletions spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -572,6 +572,23 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper {
}
}

test("seconds_of_time expression") {
// This test verifies that seconds() function works correctly with timestamp columns.
// If Spark generates SecondOfTime expression (a RuntimeReplaceable expression),
// it will be handled by the version-specific shim and converted to Second proto.
Seq(true, false).foreach { dictionaryEnabled =>
withTempDir { dir =>
val path = new Path(dir.toURI.toString, "part-r-0.parquet")
makeRawTimeParquetFile(path, dictionaryEnabled = dictionaryEnabled, 10000)
readParquetFile(path.toString) { df =>
val query = df.select(expr("second(_1)"))

checkSparkAnswerAndOperator(query)
}
}
}
}

test("hour on int96 timestamp column") {
import testImplicits._

Expand Down