3699: fix: route file-not-found errors through SparkError JSON path#47
3699: fix: route file-not-found errors through SparkError JSON path#47martin-augment wants to merge 10 commits intomainfrom
Conversation
Detect file-not-found errors from DataFusion's object store on the native side and convert them to SparkError::FileNotFound, which is serialized as JSON via CometQueryExecutionException. The shim layer then creates a proper SparkFileNotFoundException using QueryExecutionErrors.readCurrentFileNotFoundError(), producing the exact exception type that Spark tests expect. Previously, file-not-found errors arrived as CometNativeException and were pattern-matched in CometExecIterator to create a SparkException with a plain FileNotFoundException cause. Tests that cast the cause to SparkFileNotFoundException (which is private[spark]) would fail. Closes apache#3314
Remove the assume() skip for native_datafusion in FileBasedDataSourceSuite and the IgnoreCometNativeDataFusion tag in SimpleSQLViewSuite, since file-not-found errors now produce the correct SparkFileNotFoundException type.
Remove CometConf import from FileBasedDataSourceSuite and IgnoreCometNativeDataFusion import from SQLViewSuite that became unused after removing the test skips.
Extract file path from native error message and format it as "File <path> does not exist" to match the Hadoop FileNotFoundException message format that Spark tests expect.
readCurrentFileNotFoundError was removed in Spark 4.0. Construct SparkFileNotFoundException directly instead, which is accessible from the shim package.
….5.8 diff The SPARK-25207 test expects a specific error message for duplicate fields in case-insensitive mode, but native DataFusion produces a different schema error. Update the test to accept either message format.
…lds in 3.5.8 diff" This reverts commit 03328cd.
Add IgnoreCometNativeDataFusion tag to SPARK-25207 test instead of trying to accept both error messages. A separate PR will fix the underlying issue.
WalkthroughThis pull request introduces Comet-related testing infrastructure and error handling enhancements. On the testing side, it adds IgnoreComet feature tags and test utilities to selectively skip tests when Comet is enabled, along with configuration helpers to manage Comet-specific test execution. On the error handling side, it adds FileNotFound error support across the native Rust layer and Spark shims. The native layer detects file-not-found scenarios by inspecting error messages and constructs appropriate SparkError variants. Version-specific Spark shims (3.4, 3.5, and 4.0) convert FileNotFound errors to platform-appropriate exception types using regex-based path extraction and error wrapping. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the error handling mechanism within the Comet native execution engine by specifically addressing file not found errors. It introduces a dedicated error type at the native level and updates the Spark integration layers to ensure these errors are consistently and correctly translated into Spark's native file not found exceptions. This change streamlines error propagation and improves the diagnostic capabilities for users encountering missing files during Comet-accelerated operations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the handling of 'file not found' errors within the Comet native execution engine. Previously, the Scala layer would parse generic native error messages to identify and re-throw file not found exceptions. Now, the native Rust code explicitly detects these errors and propagates them as a new SparkError::FileNotFound type. Platform-specific Scala shims then translate this native error into the appropriate Spark FileNotFoundException or SparkFileNotFoundException, providing more robust and consistent error reporting. Additionally, test suites (FileBasedDataSourceSuite.scala) are updated to include Comet-specific execution plans in their analysis, and a previous assume statement related to native scan error handling is removed, reflecting the new error propagation mechanism.
🤖 Augment PR SummarySummary: This PR routes native file-not-found failures through Comet’s structured SparkError JSON pathway so Spark sees the expected file-not-found exception type. Changes:
Technical Notes: The shim extracts the missing file path from the native error message via regex and formats it to match Hadoop-style 🤖 Was this summary useful? React with 👍 or 👎 |
| // Not a SparkError, use generic exception | ||
| // Check for file-not-found errors from object store | ||
| let error_msg = e.to_string(); | ||
| if error_msg.contains("not found") |
There was a problem hiding this comment.
This FileNotFound detection is substring-based and doesn’t ensure the message matches the Object at location ... not found shape that the JVM shim later parses, so a format change could yield a FileNotFound with an unextractable/garbled path. Consider tightening the signal or carrying a structured path so the shim doesn’t depend on message parsing.
Severity: medium
Other Locations
native/core/src/errors.rs:441
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected.
| val path = "Object at location (.+?) not found".r | ||
| .findFirstMatchIn(msg) | ||
| .map(_.group(1)) | ||
| .getOrElse(msg) |
There was a problem hiding this comment.
If the native message doesn’t match the regex, getOrElse(msg) treats the entire native error text as a file path, which can lead to confusing File <full message> does not exist output. Since this behavior is format-dependent, a small regression test for errorType=FileNotFound conversion would help catch future message changes.
Severity: low
Other Locations
spark/src/main/spark-3.4/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala:255spark/src/main/spark-4.0/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala:262
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| ), | ||
| _ => env.throw_new(exception.class, exception.msg), | ||
| } | ||
| } |
There was a problem hiding this comment.
Duplicated file-not-found detection and fallback logic
Low Severity
The file-not-found detection logic (contains("not found") && contains("No such file or directory")) along with the SparkError::FileNotFound construction and the entire generic-exception fallback block are copy-pasted between the DataFusionError::External arm and the catch-all _ arm. The only difference is the source of error_msg (e.to_string() vs error.to_string()). If the detection heuristic or fallback behavior ever needs updating, it must be changed in both places, risking inconsistency.
Additional Locations (1)
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dev/diffs/3.5.8.diff (2)
74-79:⚠️ Potential issue | 🔴 CriticalCritical:
extensionClassNamesis created but never used.The variable
extensionClassNamescombines user-configured extensions with Comet extensions, but the subsequentforeachloop on line 79 iterates overextensionConfClassNames(without Comet). This means the Comet extension will never be applied.🐛 Proposed fix
val extensionConfClassNames = sparkContext.getConf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS) .getOrElse(Seq.empty) val extensionClassNames = extensionConfClassNames ++ loadCometExtension(sparkContext) - extensionConfClassNames.foreach { extensionConfClassName => + extensionClassNames.foreach { extensionConfClassName =>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev/diffs/3.5.8.diff` around lines 74 - 79, The code builds extensionClassNames by combining extensionConfClassNames with loadCometExtension(sparkContext) but then mistakenly iterates over extensionConfClassNames, so Comet extensions are never applied; change the foreach to iterate over extensionClassNames (the combined Seq) where extensionConfClassNames.foreach { ... } currently appears so that loadCometExtension(sparkContext) results are are included when registering extensions in the SparkSessionExtensions creation path.
87-94:⚠️ Potential issue | 🟡 Minor
toBooleancan throw on invalid input.If
ENABLE_COMETis set to a non-boolean value like"yes"or"1",v.toBooleanthrowsIllegalArgumentException. Consider using a safer conversion or explicitly documenting expected values.🛡️ Proposed defensive fix
def isCometEnabled: Boolean = { val v = System.getenv("ENABLE_COMET") - v == null || v.toBoolean + v == null || v.equalsIgnoreCase("true") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev/diffs/3.5.8.diff` around lines 87 - 94, The isCometEnabled method currently calls v.toBoolean which throws IllegalArgumentException on inputs like "yes" or "1"; change the conversion to a safe parse (e.g. use java.lang.Boolean.parseBoolean(v) which returns false for non-true strings) or implement an explicit whitelist (normalize v.toLowerCase and accept "true","1","yes" as true) so invalid values don't throw; update the implementation in isCometEnabled accordingly while preserving the existing null-handling (null => enabled).
🧹 Nitpick comments (1)
native/spark-expr/src/error.rs (1)
169-170: Add a dedicated unit test forFileNotFoundJSON/mapping parity.The new variant is wired in multiple match sites; a focused test will prevent accidental drift between
errorType,params.message,errorClass, andexception_class.✅ Suggested test
#[cfg(test)] mod tests { use super::*; @@ + #[test] + fn test_file_not_found_json_and_mappings() { + let error = SparkError::FileNotFound { + message: "Object at location file:/tmp/missing.parquet not found".to_string(), + }; + let json = error.to_json(); + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap(); + + assert_eq!(parsed["errorType"], "FileNotFound"); + assert_eq!(parsed["errorClass"], "_LEGACY_ERROR_TEMP_2055"); + assert_eq!( + parsed["params"]["message"], + "Object at location file:/tmp/missing.parquet not found" + ); + assert_eq!(error.error_class(), Some("_LEGACY_ERROR_TEMP_2055")); + assert_eq!(error.exception_class(), "org/apache/spark/SparkException"); + }Also applies to: 242-243, 428-432, 499-501, 574-576
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@native/spark-expr/src/error.rs` around lines 169 - 170, Add a focused unit test that verifies JSON/mapping parity for the FileNotFound variant: construct the FileNotFound { message: String } variant, serialize it to the JSON representation used by the code paths (asserting errorType, params.message, errorClass, and exception_class fields are present and have the expected values), then deserialize or inspect the JSON to ensure values round-trip and match exactly; place the test next to other error-mapping tests and reference the FileNotFound variant name and the JSON keys errorType, params.message, errorClass, and exception_class so future changes to match arms will fail the test if they drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@native/core/src/errors.rs`:
- Around line 416-423: The current not-found detection logic in the error
handling branch is too strict (it requires both "not found" and "No such file or
directory") and will miss backend-specific messages; refactor by adding a small
helper function (e.g. is_not_found_error) that centralizes broader matching
(check for any of several substrings like "not found", "No such file or
directory", "NotFound", "404", backend-specific phrases) and use it in the
existing branches that construct SparkError::FileNotFound and call
throw_spark_error_as_json; update both places noted (around the
SparkError::FileNotFound creation and the similar block at 439-446) to call the
helper instead of the dual-substring check so all object-store backends are
covered.
In
`@spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala`:
- Around line 244-255: The FileNotFound case in ShimSparkErrorConverter.scala
currently blindly wraps the native message, causing duplicates like "File File
<path> does not exist does not exist"; change the extraction logic in the case
"FileNotFound" block so that you first check if msg already matches the
Hadoop-style pattern "File (.+) does not exist" and, if so, extract only the
path group, otherwise fall back to the existing "Object at location (.+?) not
found" capture or use msg as path; then pass new FileNotFoundException(s"File
$path does not exist") into QueryExecutionErrors.readCurrentFileNotFoundError to
ensure you never produce "File File ..." duplicates (refer to variables msg,
path and the call to QueryExecutionErrors.readCurrentFileNotFoundError).
---
Outside diff comments:
In `@dev/diffs/3.5.8.diff`:
- Around line 74-79: The code builds extensionClassNames by combining
extensionConfClassNames with loadCometExtension(sparkContext) but then
mistakenly iterates over extensionConfClassNames, so Comet extensions are never
applied; change the foreach to iterate over extensionClassNames (the combined
Seq) where extensionConfClassNames.foreach { ... } currently appears so that
loadCometExtension(sparkContext) results are are included when registering
extensions in the SparkSessionExtensions creation path.
- Around line 87-94: The isCometEnabled method currently calls v.toBoolean which
throws IllegalArgumentException on inputs like "yes" or "1"; change the
conversion to a safe parse (e.g. use java.lang.Boolean.parseBoolean(v) which
returns false for non-true strings) or implement an explicit whitelist
(normalize v.toLowerCase and accept "true","1","yes" as true) so invalid values
don't throw; update the implementation in isCometEnabled accordingly while
preserving the existing null-handling (null => enabled).
---
Nitpick comments:
In `@native/spark-expr/src/error.rs`:
- Around line 169-170: Add a focused unit test that verifies JSON/mapping parity
for the FileNotFound variant: construct the FileNotFound { message: String }
variant, serialize it to the JSON representation used by the code paths
(asserting errorType, params.message, errorClass, and exception_class fields are
present and have the expected values), then deserialize or inspect the JSON to
ensure values round-trip and match exactly; place the test next to other
error-mapping tests and reference the FileNotFound variant name and the JSON
keys errorType, params.message, errorClass, and exception_class so future
changes to match arms will fail the test if they drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ff71cdd0-5290-46f3-8b19-23409e9dc8ab
📒 Files selected for processing (7)
dev/diffs/3.5.8.diffnative/core/src/errors.rsnative/spark-expr/src/error.rsspark/src/main/scala/org/apache/comet/CometExecIterator.scalaspark/src/main/spark-3.4/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scalaspark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scalaspark/src/main/spark-4.0/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala
| // Check for file-not-found errors from object store | ||
| let error_msg = e.to_string(); | ||
| if error_msg.contains("not found") | ||
| && error_msg.contains("No such file or directory") | ||
| { | ||
| let spark_error = SparkError::FileNotFound { message: error_msg }; | ||
| throw_spark_error_as_json(env, &spark_error) | ||
| } else { |
There was a problem hiding this comment.
File-not-found detection is too narrow for multi-backend object stores.
Requiring both substrings ("not found" and "No such file or directory") is fragile and can miss valid not-found errors (then falling back to generic exception flow). Centralize and broaden matching in one helper.
💡 Proposed fix
+fn is_file_not_found_message(msg: &str) -> bool {
+ let m = msg.to_ascii_lowercase();
+ (m.contains("object at location") && m.contains("not found"))
+ || m.contains("no such file or directory")
+ || m.contains("no such key")
+ || m.contains("status code: 404")
+}
+
fn throw_exception(env: &mut JNIEnv, error: &CometError, backtrace: Option<String>) {
@@
- if error_msg.contains("not found")
- && error_msg.contains("No such file or directory")
- {
+ if is_file_not_found_message(&error_msg) {
let spark_error = SparkError::FileNotFound { message: error_msg };
throw_spark_error_as_json(env, &spark_error)
@@
- if error_msg.contains("not found")
- && error_msg.contains("No such file or directory")
- {
+ if is_file_not_found_message(&error_msg) {
let spark_error = SparkError::FileNotFound { message: error_msg };
throw_spark_error_as_json(env, &spark_error)Also applies to: 439-446
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@native/core/src/errors.rs` around lines 416 - 423, The current not-found
detection logic in the error handling branch is too strict (it requires both
"not found" and "No such file or directory") and will miss backend-specific
messages; refactor by adding a small helper function (e.g. is_not_found_error)
that centralizes broader matching (check for any of several substrings like "not
found", "No such file or directory", "NotFound", "404", backend-specific
phrases) and use it in the existing branches that construct
SparkError::FileNotFound and call throw_spark_error_as_json; update both places
noted (around the SparkError::FileNotFound creation and the similar block at
439-446) to call the helper instead of the dual-substring check so all
object-store backends are covered.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected.
| case "FileNotFound" => | ||
| val msg = params("message").toString | ||
| // Extract file path from native error message and format like Hadoop's | ||
| // FileNotFoundException: "File <path> does not exist" | ||
| val path = "Object at location (.+?) not found".r | ||
| .findFirstMatchIn(msg) | ||
| .map(_.group(1)) | ||
| .getOrElse(msg) | ||
| Some( | ||
| QueryExecutionErrors.readCurrentFileNotFoundError( | ||
| new FileNotFoundException(s"File $path does not exist"))) | ||
|
|
There was a problem hiding this comment.
Avoid double-wrapping already formatted file-not-found messages.
If Line 245 already contains File <path> does not exist, the fallback at Line 251 produces File File <path> does not exist does not exist. Normalize the message before wrapping.
💡 Proposed fix
case "FileNotFound" =>
val msg = params("message").toString
- // Extract file path from native error message and format like Hadoop's
- // FileNotFoundException: "File <path> does not exist"
- val path = "Object at location (.+?) not found".r
- .findFirstMatchIn(msg)
- .map(_.group(1))
- .getOrElse(msg)
+ // Normalize to Hadoop-style: "File <path> does not exist"
+ val normalizedMsg =
+ if (msg.startsWith("File ") && msg.endsWith(" does not exist")) {
+ msg
+ } else {
+ val path = "Object at location (.+?) not found".r
+ .findFirstMatchIn(msg)
+ .map(_.group(1))
+ .getOrElse(msg)
+ s"File $path does not exist"
+ }
Some(
QueryExecutionErrors.readCurrentFileNotFoundError(
- new FileNotFoundException(s"File $path does not exist")))
+ new FileNotFoundException(normalizedMsg)))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case "FileNotFound" => | |
| val msg = params("message").toString | |
| // Extract file path from native error message and format like Hadoop's | |
| // FileNotFoundException: "File <path> does not exist" | |
| val path = "Object at location (.+?) not found".r | |
| .findFirstMatchIn(msg) | |
| .map(_.group(1)) | |
| .getOrElse(msg) | |
| Some( | |
| QueryExecutionErrors.readCurrentFileNotFoundError( | |
| new FileNotFoundException(s"File $path does not exist"))) | |
| case "FileNotFound" => | |
| val msg = params("message").toString | |
| // Normalize to Hadoop-style: "File <path> does not exist" | |
| val normalizedMsg = | |
| if (msg.startsWith("File ") && msg.endsWith(" does not exist")) { | |
| msg | |
| } else { | |
| val path = "Object at location (.+?) not found".r | |
| .findFirstMatchIn(msg) | |
| .map(_.group(1)) | |
| .getOrElse(msg) | |
| s"File $path does not exist" | |
| } | |
| Some( | |
| QueryExecutionErrors.readCurrentFileNotFoundError( | |
| new FileNotFoundException(normalizedMsg))) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@spark/src/main/spark-3.5/org/apache/spark/sql/comet/shims/ShimSparkErrorConverter.scala`
around lines 244 - 255, The FileNotFound case in ShimSparkErrorConverter.scala
currently blindly wraps the native message, causing duplicates like "File File
<path> does not exist does not exist"; change the extraction logic in the case
"FileNotFound" block so that you first check if msg already matches the
Hadoop-style pattern "File (.+) does not exist" and, if so, extract only the
path group, otherwise fall back to the existing "Object at location (.+?) not
found" capture or use msg as path; then pass new FileNotFoundException(s"File
$path does not exist") into QueryExecutionErrors.readCurrentFileNotFoundError to
ensure you never produce "File File ..." duplicates (refer to variables msg,
path and the call to QueryExecutionErrors.readCurrentFileNotFoundError).
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected.
Code ReviewOverall this is a well-structured fix. Moving file-not-found detection out of native/core/src/errors.rs — detection heuristic The detection uses Fragility — Duplicated logic — the same block appears in two Coupling — the full raw message is stored in native/spark-expr/src/error.rs
ShimSparkErrorConverter.scala — all three shim versions Regex compiled per call —
Code duplication — the Spark 3.4 and 3.5 CometExecIterator.scala Removal of Test diff (dev/diffs/3.5.8.diff) Removing the Summary Main actionable suggestions:
|
value:useful; category:bug; feedback: The Claude AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected. |
value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! The regex is compiled on each usage. It would be better to compile it once, as a class member constant, and just execute it when needed. This would be much faster |


3699: To review by AI