-
Notifications
You must be signed in to change notification settings - Fork 0
3699: fix: route file-not-found errors through SparkError JSON path #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b97571e
90ecaa2
64b63c5
b0d2b70
030105f
81cd6d8
e630c71
03328cd
37d10e3
27469e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -413,7 +413,37 @@ fn throw_exception(env: &mut JNIEnv, error: &CometError, backtrace: Option<Strin | |
| // Fall back to plain SparkError (no context) | ||
| throw_spark_error_as_json(env, spark_error) | ||
| } else { | ||
| // Not a SparkError, use generic exception | ||
| // Check for file-not-found errors from object store | ||
| let error_msg = e.to_string(); | ||
| if error_msg.contains("not found") | ||
| && error_msg.contains("No such file or directory") | ||
| { | ||
| let spark_error = SparkError::FileNotFound { message: error_msg }; | ||
| throw_spark_error_as_json(env, &spark_error) | ||
| } else { | ||
|
Comment on lines
+416
to
+423
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. File-not-found detection is too narrow for multi-backend object stores. Requiring both substrings ( 💡 Proposed fix+fn is_file_not_found_message(msg: &str) -> bool {
+ let m = msg.to_ascii_lowercase();
+ (m.contains("object at location") && m.contains("not found"))
+ || m.contains("no such file or directory")
+ || m.contains("no such key")
+ || m.contains("status code: 404")
+}
+
fn throw_exception(env: &mut JNIEnv, error: &CometError, backtrace: Option<String>) {
@@
- if error_msg.contains("not found")
- && error_msg.contains("No such file or directory")
- {
+ if is_file_not_found_message(&error_msg) {
let spark_error = SparkError::FileNotFound { message: error_msg };
throw_spark_error_as_json(env, &spark_error)
@@
- if error_msg.contains("not found")
- && error_msg.contains("No such file or directory")
- {
+ if is_file_not_found_message(&error_msg) {
let spark_error = SparkError::FileNotFound { message: error_msg };
throw_spark_error_as_json(env, &spark_error)Also applies to: 439-446 🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected. |
||
| // Not a SparkError, use generic exception | ||
| let exception = error.to_exception(); | ||
| match backtrace { | ||
| Some(backtrace_string) => env.throw_new( | ||
| exception.class, | ||
| to_stacktrace_string(exception.msg, backtrace_string).unwrap(), | ||
| ), | ||
| _ => env.throw_new(exception.class, exception.msg), | ||
| } | ||
| } | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duplicated file-not-found detection and fallback logicLow Severity The file-not-found detection logic ( Additional Locations (1)
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected. |
||
| } | ||
| } | ||
| // Handle direct SparkError - serialize to JSON | ||
| CometError::Spark(spark_error) => throw_spark_error_as_json(env, spark_error), | ||
| _ => { | ||
| // Check for file-not-found errors that may arrive through other wrapping paths | ||
| let error_msg = error.to_string(); | ||
| if error_msg.contains("not found") | ||
| && error_msg.contains("No such file or directory") | ||
| { | ||
| let spark_error = SparkError::FileNotFound { message: error_msg }; | ||
| throw_spark_error_as_json(env, &spark_error) | ||
| } else { | ||
| let exception = error.to_exception(); | ||
| match backtrace { | ||
| Some(backtrace_string) => env.throw_new( | ||
|
|
@@ -424,18 +454,6 @@ fn throw_exception(env: &mut JNIEnv, error: &CometError, backtrace: Option<Strin | |
| } | ||
| } | ||
| } | ||
| // Handle direct SparkError - serialize to JSON | ||
| CometError::Spark(spark_error) => throw_spark_error_as_json(env, spark_error), | ||
| _ => { | ||
| let exception = error.to_exception(); | ||
| match backtrace { | ||
| Some(backtrace_string) => env.throw_new( | ||
| exception.class, | ||
| to_stacktrace_string(exception.msg, backtrace_string).unwrap(), | ||
| ), | ||
| _ => env.throw_new(exception.class, exception.msg), | ||
| } | ||
| } | ||
| } | ||
| .expect("Thrown exception") | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,6 +19,8 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| package org.apache.spark.sql.comet.shims | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import java.io.FileNotFoundException | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import org.apache.spark.{QueryContext, SparkException} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.catalyst.trees.SQLQueryContext | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import org.apache.spark.sql.errors.QueryExecutionErrors | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -239,6 +241,18 @@ trait ShimSparkErrorConverter { | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| QueryExecutionErrors | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .intervalArithmeticOverflowError("Interval arithmetic overflow", "", sqlCtx(context))) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| case "FileNotFound" => | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| val msg = params("message").toString | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // Extract file path from native error message and format like Hadoop's | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // FileNotFoundException: "File <path> does not exist" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| val path = "Object at location (.+?) not found".r | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .findFirstMatchIn(msg) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .map(_.group(1)) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| .getOrElse(msg) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the native message doesn’t match the regex, Severity: low Other Locations
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback: The Augment AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Some( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| QueryExecutionErrors.readCurrentFileNotFoundError( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| new FileNotFoundException(s"File $path does not exist"))) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+244
to
+255
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid double-wrapping already formatted file-not-found messages. If Line 245 already contains 💡 Proposed fix case "FileNotFound" =>
val msg = params("message").toString
- // Extract file path from native error message and format like Hadoop's
- // FileNotFoundException: "File <path> does not exist"
- val path = "Object at location (.+?) not found".r
- .findFirstMatchIn(msg)
- .map(_.group(1))
- .getOrElse(msg)
+ // Normalize to Hadoop-style: "File <path> does not exist"
+ val normalizedMsg =
+ if (msg.startsWith("File ") && msg.endsWith(" does not exist")) {
+ msg
+ } else {
+ val path = "Object at location (.+?) not found".r
+ .findFirstMatchIn(msg)
+ .map(_.group(1))
+ .getOrElse(msg)
+ s"File $path does not exist"
+ }
Some(
QueryExecutionErrors.readCurrentFileNotFoundError(
- new FileNotFoundException(s"File $path does not exist")))
+ new FileNotFoundException(normalizedMsg)))📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| case _ => | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| None | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
FileNotFounddetection is substring-based and doesn’t ensure the message matches theObject at location ... not foundshape that the JVM shim later parses, so a format change could yield aFileNotFoundwith an unextractable/garbled path. Consider tightening the signal or carrying a structured path so the shim doesn’t depend on message parsing.Severity: medium
Other Locations
native/core/src/errors.rs:441🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The detection of the error type is fragile and it could match a totally unrelated error. It looks like a best effort detection that in the worst case would lead to a wrongly formatted "full" error. It will be improved with the time when the users provide examples of errors which fail be to detected.