Skip to content

3618: feat: Support Spark expression second_of_time#41

Open
martin-augment wants to merge 1 commit intomainfrom
pr-3618-2026-03-02-12-25-11
Open

3618: feat: Support Spark expression second_of_time#41
martin-augment wants to merge 1 commit intomainfrom
pr-3618-2026-03-02-12-25-11

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3618: To review by AI

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 2, 2026

Walkthrough

This change introduces support for handling SecondOfTime expressions in Comet's expression serialization pipeline. A new method secondOfTimeToProto is added to convert SecondOfTime expressions to their protobuf representation, including timezone resolution with UTC as fallback. The conversion method is integrated into the expression-to-proto routing logic across multiple Spark version shims (3.4, 3.5, 4.0), enabling proper handling of SecondOfTime expressions during query plan serialization. A corresponding test case is added to validate the functionality of extracting seconds from timestamp columns via Parquet files with various configuration options.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-3618-2026-03-02-12-25-11

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for the second_of_time Spark expression, enabling its proper serialization and handling across various Spark versions within the Comet project. The changes ensure that queries utilizing the second() function on timestamp columns are correctly processed, enhancing the compatibility and functionality of Comet with Spark's time-related operations.

Highlights

  • Spark Expression Support: Implemented support for the second_of_time Spark expression, allowing it to be correctly processed and serialized within the Comet framework.
  • Cross-Spark Version Compatibility: Integrated the secondOfTimeToProto serialization logic into the CometExprShim for Spark versions 3.4, 3.5, and 4.0, ensuring consistent behavior across different environments.
  • New Test Coverage: Added a dedicated test case in CometExpressionSuite to validate the correct functioning of the second() SQL function when applied to timestamp columns, including dictionary-encoded parquet files.
Changelog
  • spark/src/main/scala/org/apache/comet/serde/strings.scala
    • Added a new function secondOfTimeToProto to serialize Spark's SecondOfTime expression into a Protobuf Second message, including handling of child expressions and time zone information.
  • spark/src/main/spark-3.4/org/apache/comet/shims/CometExprShim.scala
    • Imported withInfo and exprToProtoInternal for use in expression serialization.
    • Extended the buildExpr method to include a case for SecondOfTime expressions, delegating their serialization to secondOfTimeToProto.
  • spark/src/main/spark-3.5/org/apache/comet/shims/CometExprShim.scala
    • Extended the buildExpr method to include a case for SecondOfTime expressions, delegating their serialization to secondOfTimeToProto.
  • spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala
    • Extended the buildExpr method to include a case for SecondOfTime expressions, delegating their serialization to secondOfTimeToProto.
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
    • Added a new test named "seconds_of_time expression" to verify the second() SQL function's behavior with timestamp columns, covering both dictionary-enabled and disabled scenarios.
Activity
  • The pull request was created and is awaiting review.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Spark second_of_time expression, which is a RuntimeReplaceable expression for the second() function on timestamps. The implementation correctly uses reflection to access properties of this private Spark class and includes shims for different Spark versions. The changes are well-structured and include a new test case. My main feedback is to improve the reflection logic for better correctness and readability by handling a potential SecurityException and simplifying a nested try-catch structure.

Comment on lines +422 to +436
val timeZoneId = {
val exprClass = expr.getClass
try {
val timeZoneIdMethod = exprClass.getMethod("timeZoneId")
timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchMethodException =>
try {
val timeZoneIdField = exprClass.getField("timeZoneId")
timeZoneIdField.get(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchFieldException | _: SecurityException => None
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The reflection logic to get timeZoneId can be improved for correctness and readability.

  1. Correctness: The current code doesn't handle a potential SecurityException from exprClass.getMethod("timeZoneId"). This could lead to an unhandled exception during query planning.
  2. Readability: The nested try-catch block is hard to follow.

I suggest refactoring this part to un-nest the try-catch blocks and handle SecurityException for both reflection calls. This makes the code more robust and easier to understand.

Suggested change
val timeZoneId = {
val exprClass = expr.getClass
try {
val timeZoneIdMethod = exprClass.getMethod("timeZoneId")
timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchMethodException =>
try {
val timeZoneIdField = exprClass.getField("timeZoneId")
timeZoneIdField.get(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchFieldException | _: SecurityException => None
}
}
}
val timeZoneId = {
val exprClass = expr.getClass
val fromMethod = try {
exprClass.getMethod("timeZoneId").invoke(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchMethodException | _: SecurityException => None
}
fromMethod.orElse {
try {
exprClass.getField("timeZoneId").get(expr).asInstanceOf[Option[String]]
} catch {
case _: NoSuchFieldException | _: SecurityException => None
}
}
}

@claude
Copy link
Copy Markdown

claude bot commented Mar 2, 2026

PR Review: feat: Support Spark expression second_of_time

Thanks for the contribution! The approach of handling SecondOfTime (a RuntimeReplaceable Spark expression) via version-specific shims is reasonable, but there are several issues worth addressing before merging.


Critical Issues

1. Unnecessary reflection for timeZoneId

The most significant concern is the reflection-based timezone lookup in secondOfTimeToProto (strings.scala lines 422–436). Spark's TimeZoneAwareExpression trait (which Second and SecondOfTime implement) exposes timeZoneId as a public method on the trait. You can cast directly to it without reflection:

val timeZoneId = expr match {
  case tz: TimeZoneAwareExpression => tz.timeZoneId
  case _ => None
}

This is simpler, type-safe, faster (no reflection overhead), and far less fragile. The reflection approach silently swallows errors that could indicate a real problem—for example, if the returned value isn't actually an Option[String], the asInstanceOf cast will throw a ClassCastException at runtime with no meaningful error message.

2. getField does not find constructor parameters

The fallback reflection path calls exprClass.getField("timeZoneId"), which only finds public, non-synthetic fields. In Scala case classes (which is what SecondOfTime is), constructor parameters are accessed through getter methods, not raw fields. This fallback will never succeed in practice—it's dead code that gives a false sense of robustness.


Code Quality Issues

3. Wrong file — secondOfTimeToProto belongs in datetime.scala

The strings.scala file contains string expression helpers. The new secondOfTimeToProto method deals with datetime/timestamp logic and fits the established pattern in datetime.scala (where CometSecond, CometHour, and CometMinute all live). Placing it in strings.scala breaks the project's clear separation of concerns.

4. String-based class matching is fragile

case _ if expr.getClass.getSimpleName == "SecondOfTime" =>

This will match any class named SecondOfTime from any package, including user-defined classes. While this pattern is sometimes unavoidable for cross-version shims, a comment explaining why the class can't be referenced by type (e.g., because it doesn't exist in Spark 3.4's public API) would help future maintainers understand the intent.


Test Coverage Issues

5. Test name doesn't match what's being tested

The test is named "seconds_of_time expression" but:

  • The Spark function being called is second() (singular), not seconds()
  • The Spark expression being handled is SecondOfTime

Consider renaming to something like "second() function on timestamp column" or "SecondOfTime expression".

6. No timezone coverage in the test

The implementation goes out of its way to extract and pass the timezone, but there is no test with a non-UTC session timezone. Since Second is timezone-sensitive (e.g., midnight UTC is 8pm in America/Los_Angeles), missing non-UTC test coverage means a potential timezone bug could go undetected. The adjacent "cast timestamp and timestamp_ntz" test (line 621) shows the pattern for this:

withSQLConf(SESSION_LOCAL_TIMEZONE.key -> "Asia/Kolkata") {
  // verify second() result matches Spark's output
}

7. No negative test / fallback verification

There is no test confirming that when Comet can't serialize the expression (e.g., child serialization fails), it falls back gracefully to Spark rather than throwing. Given that the fallback path is new code, a test with an unsupported child type or checkSparkAnswer after disabling Comet would add confidence.


Minor Nit

The comment in the test says "If Spark generates SecondOfTime expression..." — it should say "when Spark generates..." since on the Spark versions this PR targets, the second() function on a non-literal timestamp column will generate SecondOfTime.


Summary

Severity Issue
Critical Replace reflection with TimeZoneAwareExpression cast
Critical Remove dead/incorrect getField fallback
Moderate Move secondOfTimeToProto to datetime.scala
Moderate Add non-UTC timezone test
Minor Fix test name to match the expression/function name
Minor Add comment explaining why string-based class matching is used

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)

575-590: Strengthen this test to assert the SecondOfTime path is actually exercised.

Current coverage validates second(...) behavior, but it may still pass without hitting the new SecondOfTime shim branch.

✅ Suggested test hardening
   test("seconds_of_time expression") {
@@
         readParquetFile(path.toString) { df =>
           val query = df.select(expr("second(_1)"))
+          if (isSpark40Plus) {
+            assert(
+              query.queryExecution.analyzed.expressions.exists(
+                _.getClass.getSimpleName == "SecondOfTime"),
+              "Expected analyzed plan to contain SecondOfTime")
+          }

           checkSparkAnswerAndOperator(query)
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala` around
lines 575 - 590, The test currently verifies results but doesn't ensure Spark
actually produced a SecondOfTime runtime-replaceable expression; update the test
in CometExpressionSuite.scala (inside the "seconds_of_time expression" test that
uses makeRawTimeParquetFile, readParquetFile and checkSparkAnswerAndOperator) to
inspect the query plan (e.g., query.queryExecution.analyzed or
query.queryExecution.optimizedPlan) and assert that it contains the SecondOfTime
expression (by checking for the SecondOfTime class name or by
pattern-matching/collecting an instance of
org.apache.spark.sql.catalyst.expressions.SecondOfTime) before calling
checkSparkAnswerAndOperator so the SecondOfTime shim branch is definitely
exercised.
spark/src/main/scala/org/apache/comet/serde/strings.scala (1)

422-435: Harden reflection for timeZoneId to avoid planner-time runtime exceptions.

The current reflection path only handles missing members (NoSuchMethodException, NoSuchFieldException, SecurityException); invocation failures like InvocationTargetException, IllegalAccessException, and ClassCastException from the unchecked asInstanceOf cast can still propagate and fail query planning.

🛡️ Safer reflection handling
+import scala.util.control.NonFatal
@@
-      val timeZoneId = {
+      val timeZoneId: Option[String] = {
         val exprClass = expr.getClass
-        try {
-          val timeZoneIdMethod = exprClass.getMethod("timeZoneId")
-          timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]]
-        } catch {
-          case _: NoSuchMethodException =>
-            try {
-              val timeZoneIdField = exprClass.getField("timeZoneId")
-              timeZoneIdField.get(expr).asInstanceOf[Option[String]]
-            } catch {
-              case _: NoSuchFieldException | _: SecurityException => None
-            }
-        }
+        val rawValue: Any =
+          try {
+            exprClass.getMethod("timeZoneId").invoke(expr)
+          } catch {
+            case _: NoSuchMethodException | _: SecurityException =>
+              try {
+                exprClass.getField("timeZoneId").get(expr)
+              } catch {
+                case _: NoSuchFieldException | _: SecurityException => None
+              }
+            case NonFatal(_) => None
+          }
+
+        rawValue match {
+          case Some(tz: String) => Some(tz)
+          case tz: String => Some(tz)
+          case None | null => None
+          case _ => None
+        }
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark/src/main/scala/org/apache/comet/serde/strings.scala` around lines 422 -
435, The reflection block extracting timeZoneId from expr (using exprClass,
timeZoneIdMethod, timeZoneIdField) must be hardened to catch invocation and
access failures and bad casts: wrap the Method.invoke and Field.get calls and
the unsafe asInstanceOf in try/catch that also handles
InvocationTargetException, IllegalAccessException, and ClassCastException
(returning None on those cases), so that any failure when calling timeZoneId or
casting its result does not propagate during planning; ensure the final value
assigned to timeZoneId is an Option[String] and falls back to None on these
errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@spark/src/main/scala/org/apache/comet/serde/strings.scala`:
- Around line 422-435: The reflection block extracting timeZoneId from expr
(using exprClass, timeZoneIdMethod, timeZoneIdField) must be hardened to catch
invocation and access failures and bad casts: wrap the Method.invoke and
Field.get calls and the unsafe asInstanceOf in try/catch that also handles
InvocationTargetException, IllegalAccessException, and ClassCastException
(returning None on those cases), so that any failure when calling timeZoneId or
casting its result does not propagate during planning; ensure the final value
assigned to timeZoneId is an Option[String] and falls back to None on these
errors.

In `@spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala`:
- Around line 575-590: The test currently verifies results but doesn't ensure
Spark actually produced a SecondOfTime runtime-replaceable expression; update
the test in CometExpressionSuite.scala (inside the "seconds_of_time expression"
test that uses makeRawTimeParquetFile, readParquetFile and
checkSparkAnswerAndOperator) to inspect the query plan (e.g.,
query.queryExecution.analyzed or query.queryExecution.optimizedPlan) and assert
that it contains the SecondOfTime expression (by checking for the SecondOfTime
class name or by pattern-matching/collecting an instance of
org.apache.spark.sql.catalyst.expressions.SecondOfTime) before calling
checkSparkAnswerAndOperator so the SecondOfTime shim branch is definitely
exercised.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddb9375 and e732af8.

📒 Files selected for processing (5)
  • spark/src/main/scala/org/apache/comet/serde/strings.scala
  • spark/src/main/spark-3.4/org/apache/comet/shims/CometExprShim.scala
  • spark/src/main/spark-3.5/org/apache/comet/shims/CometExprShim.scala
  • spark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scala
  • spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 2, 2026

🤖 Augment PR Summary

Summary: Adds support for Spark’s runtime expression SecondOfTime by serializing it into Comet’s Second protobuf expression.

Changes:

  • Introduced secondOfTimeToProto in CommonStringExprs to build a Second proto from the child expression and timezone
  • Hooked the conversion into Spark 3.4 / 3.5 / 4.0 expression shims via class-name matching
  • Added a JVM suite test exercising second(...) over parquet timestamp data with/without dictionary encoding

Technical Notes: Uses reflection to read timeZoneId for cross-version compatibility and defaults to UTC when unavailable.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


import org.apache.spark.sql.catalyst.expressions._

import org.apache.comet.CometSparkSessionExtensions.withInfo
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new imports (withInfo, exprToProtoInternal) appear unused in this shim; under the strict-warnings profile (-Ywarn-unused:imports + -Xfatal-warnings) this can fail the build.

Severity: medium

Other Locations
  • spark/src/main/spark-3.4/org/apache/comet/shims/CometExprShim.scala:28

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! The unused imports will lead to warnings (or errors if the project's CI is configured to report warnings as errors). The unused imports should be removed

val exprClass = expr.getClass
try {
val timeZoneIdMethod = exprClass.getMethod("timeZoneId")
timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reflective timeZoneId lookup only handles missing method/field; invoke/access/cast can still throw (e.g., IllegalAccessException, InvocationTargetException, ClassCastException) and crash serde/planning. Consider catching NonFatal and falling back to None (with withInfo) instead of letting it propagate.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The SecondsOfTime class has neither timeZoneId method nor a field with that name, so this reflection based lookups always fail. The timezone lookups should be removed and UTC should be used if a timezone is really needed.

@martin-augment
Copy link
Copy Markdown
Owner Author

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

None
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datetime function misplaced in string expressions trait

Low Severity

secondOfTimeToProto is a datetime expression handler added to CommonStringExprs trait in strings.scala. The codebase already has a dedicated datetime.scala file containing nearly identical handlers for Second, Hour, and Minute (e.g. CometSecond). Placing datetime logic in a string expressions trait breaks the established code organization, making the method harder to discover and maintain alongside its closely related peers.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Bugbot AI reviewer is correct! The logic is Time-related, so it should be added to datetime.rs module instead of strings.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants