3618: feat: Support Spark expression second_of_time#41
3618: feat: Support Spark expression second_of_time#41martin-augment wants to merge 1 commit intomainfrom
Conversation
WalkthroughThis change introduces support for handling SecondOfTime expressions in Comet's expression serialization pipeline. A new method ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the Spark second_of_time expression, which is a RuntimeReplaceable expression for the second() function on timestamps. The implementation correctly uses reflection to access properties of this private Spark class and includes shims for different Spark versions. The changes are well-structured and include a new test case. My main feedback is to improve the reflection logic for better correctness and readability by handling a potential SecurityException and simplifying a nested try-catch structure.
| val timeZoneId = { | ||
| val exprClass = expr.getClass | ||
| try { | ||
| val timeZoneIdMethod = exprClass.getMethod("timeZoneId") | ||
| timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]] | ||
| } catch { | ||
| case _: NoSuchMethodException => | ||
| try { | ||
| val timeZoneIdField = exprClass.getField("timeZoneId") | ||
| timeZoneIdField.get(expr).asInstanceOf[Option[String]] | ||
| } catch { | ||
| case _: NoSuchFieldException | _: SecurityException => None | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The reflection logic to get timeZoneId can be improved for correctness and readability.
- Correctness: The current code doesn't handle a potential
SecurityExceptionfromexprClass.getMethod("timeZoneId"). This could lead to an unhandled exception during query planning. - Readability: The nested
try-catchblock is hard to follow.
I suggest refactoring this part to un-nest the try-catch blocks and handle SecurityException for both reflection calls. This makes the code more robust and easier to understand.
| val timeZoneId = { | |
| val exprClass = expr.getClass | |
| try { | |
| val timeZoneIdMethod = exprClass.getMethod("timeZoneId") | |
| timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]] | |
| } catch { | |
| case _: NoSuchMethodException => | |
| try { | |
| val timeZoneIdField = exprClass.getField("timeZoneId") | |
| timeZoneIdField.get(expr).asInstanceOf[Option[String]] | |
| } catch { | |
| case _: NoSuchFieldException | _: SecurityException => None | |
| } | |
| } | |
| } | |
| val timeZoneId = { | |
| val exprClass = expr.getClass | |
| val fromMethod = try { | |
| exprClass.getMethod("timeZoneId").invoke(expr).asInstanceOf[Option[String]] | |
| } catch { | |
| case _: NoSuchMethodException | _: SecurityException => None | |
| } | |
| fromMethod.orElse { | |
| try { | |
| exprClass.getField("timeZoneId").get(expr).asInstanceOf[Option[String]] | |
| } catch { | |
| case _: NoSuchFieldException | _: SecurityException => None | |
| } | |
| } | |
| } |
PR Review: feat: Support Spark expression second_of_timeThanks for the contribution! The approach of handling Critical Issues1. Unnecessary reflection for
|
| Severity | Issue |
|---|---|
| Critical | Replace reflection with TimeZoneAwareExpression cast |
| Critical | Remove dead/incorrect getField fallback |
| Moderate | Move secondOfTimeToProto to datetime.scala |
| Moderate | Add non-UTC timezone test |
| Minor | Fix test name to match the expression/function name |
| Minor | Add comment explaining why string-based class matching is used |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala (1)
575-590: Strengthen this test to assert theSecondOfTimepath is actually exercised.Current coverage validates
second(...)behavior, but it may still pass without hitting the newSecondOfTimeshim branch.✅ Suggested test hardening
test("seconds_of_time expression") { @@ readParquetFile(path.toString) { df => val query = df.select(expr("second(_1)")) + if (isSpark40Plus) { + assert( + query.queryExecution.analyzed.expressions.exists( + _.getClass.getSimpleName == "SecondOfTime"), + "Expected analyzed plan to contain SecondOfTime") + } checkSparkAnswerAndOperator(query) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala` around lines 575 - 590, The test currently verifies results but doesn't ensure Spark actually produced a SecondOfTime runtime-replaceable expression; update the test in CometExpressionSuite.scala (inside the "seconds_of_time expression" test that uses makeRawTimeParquetFile, readParquetFile and checkSparkAnswerAndOperator) to inspect the query plan (e.g., query.queryExecution.analyzed or query.queryExecution.optimizedPlan) and assert that it contains the SecondOfTime expression (by checking for the SecondOfTime class name or by pattern-matching/collecting an instance of org.apache.spark.sql.catalyst.expressions.SecondOfTime) before calling checkSparkAnswerAndOperator so the SecondOfTime shim branch is definitely exercised.spark/src/main/scala/org/apache/comet/serde/strings.scala (1)
422-435: Harden reflection fortimeZoneIdto avoid planner-time runtime exceptions.The current reflection path only handles missing members (
NoSuchMethodException,NoSuchFieldException,SecurityException); invocation failures likeInvocationTargetException,IllegalAccessException, andClassCastExceptionfrom the uncheckedasInstanceOfcast can still propagate and fail query planning.🛡️ Safer reflection handling
+import scala.util.control.NonFatal @@ - val timeZoneId = { + val timeZoneId: Option[String] = { val exprClass = expr.getClass - try { - val timeZoneIdMethod = exprClass.getMethod("timeZoneId") - timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]] - } catch { - case _: NoSuchMethodException => - try { - val timeZoneIdField = exprClass.getField("timeZoneId") - timeZoneIdField.get(expr).asInstanceOf[Option[String]] - } catch { - case _: NoSuchFieldException | _: SecurityException => None - } - } + val rawValue: Any = + try { + exprClass.getMethod("timeZoneId").invoke(expr) + } catch { + case _: NoSuchMethodException | _: SecurityException => + try { + exprClass.getField("timeZoneId").get(expr) + } catch { + case _: NoSuchFieldException | _: SecurityException => None + } + case NonFatal(_) => None + } + + rawValue match { + case Some(tz: String) => Some(tz) + case tz: String => Some(tz) + case None | null => None + case _ => None + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spark/src/main/scala/org/apache/comet/serde/strings.scala` around lines 422 - 435, The reflection block extracting timeZoneId from expr (using exprClass, timeZoneIdMethod, timeZoneIdField) must be hardened to catch invocation and access failures and bad casts: wrap the Method.invoke and Field.get calls and the unsafe asInstanceOf in try/catch that also handles InvocationTargetException, IllegalAccessException, and ClassCastException (returning None on those cases), so that any failure when calling timeZoneId or casting its result does not propagate during planning; ensure the final value assigned to timeZoneId is an Option[String] and falls back to None on these errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@spark/src/main/scala/org/apache/comet/serde/strings.scala`:
- Around line 422-435: The reflection block extracting timeZoneId from expr
(using exprClass, timeZoneIdMethod, timeZoneIdField) must be hardened to catch
invocation and access failures and bad casts: wrap the Method.invoke and
Field.get calls and the unsafe asInstanceOf in try/catch that also handles
InvocationTargetException, IllegalAccessException, and ClassCastException
(returning None on those cases), so that any failure when calling timeZoneId or
casting its result does not propagate during planning; ensure the final value
assigned to timeZoneId is an Option[String] and falls back to None on these
errors.
In `@spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala`:
- Around line 575-590: The test currently verifies results but doesn't ensure
Spark actually produced a SecondOfTime runtime-replaceable expression; update
the test in CometExpressionSuite.scala (inside the "seconds_of_time expression"
test that uses makeRawTimeParquetFile, readParquetFile and
checkSparkAnswerAndOperator) to inspect the query plan (e.g.,
query.queryExecution.analyzed or query.queryExecution.optimizedPlan) and assert
that it contains the SecondOfTime expression (by checking for the SecondOfTime
class name or by pattern-matching/collecting an instance of
org.apache.spark.sql.catalyst.expressions.SecondOfTime) before calling
checkSparkAnswerAndOperator so the SecondOfTime shim branch is definitely
exercised.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
spark/src/main/scala/org/apache/comet/serde/strings.scalaspark/src/main/spark-3.4/org/apache/comet/shims/CometExprShim.scalaspark/src/main/spark-3.5/org/apache/comet/shims/CometExprShim.scalaspark/src/main/spark-4.0/org/apache/comet/shims/CometExprShim.scalaspark/src/test/scala/org/apache/comet/CometExpressionSuite.scala
🤖 Augment PR SummarySummary: Adds support for Spark’s runtime expression Changes:
Technical Notes: Uses reflection to read 🤖 Was this summary useful? React with 👍 or 👎 |
|
|
||
| import org.apache.spark.sql.catalyst.expressions._ | ||
|
|
||
| import org.apache.comet.CometSparkSessionExtensions.withInfo |
There was a problem hiding this comment.
These new imports (withInfo, exprToProtoInternal) appear unused in this shim; under the strict-warnings profile (-Ywarn-unused:imports + -Xfatal-warnings) this can fail the build.
Severity: medium
Other Locations
spark/src/main/spark-3.4/org/apache/comet/shims/CometExprShim.scala:28
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! The unused imports will lead to warnings (or errors if the project's CI is configured to report warnings as errors). The unused imports should be removed
| val exprClass = expr.getClass | ||
| try { | ||
| val timeZoneIdMethod = exprClass.getMethod("timeZoneId") | ||
| timeZoneIdMethod.invoke(expr).asInstanceOf[Option[String]] |
There was a problem hiding this comment.
The reflective timeZoneId lookup only handles missing method/field; invoke/access/cast can still throw (e.g., IllegalAccessException, InvocationTargetException, ClassCastException) and crash serde/planning. Consider catching NonFatal and falling back to None (with withInfo) instead of letting it propagate.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The SecondsOfTime class has neither timeZoneId method nor a field with that name, so this reflection based lookups always fail. The timezone lookups should be removed and UTC should be used if a timezone is really needed.
|
@cursor review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| None | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Datetime function misplaced in string expressions trait
Low Severity
secondOfTimeToProto is a datetime expression handler added to CommonStringExprs trait in strings.scala. The codebase already has a dedicated datetime.scala file containing nearly identical handlers for Second, Hour, and Minute (e.g. CometSecond). Placing datetime logic in a string expressions trait breaks the established code organization, making the method harder to discover and maintain alongside its closely related peers.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Bugbot AI reviewer is correct! The logic is Time-related, so it should be added to datetime.rs module instead of strings.rs


3618: To review by AI