-
Notifications
You must be signed in to change notification settings - Fork 8
fix: propagate errors when datamanager fails to resolve paths for a refg #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
The PR updates error handling in refg_registry_t::as_refg to ensure resolution failures in the datamanager surface instead of being silently swallowed.
- Replace empty catch with
GK_RETHROWto propagate path-resolution errors. - Enhance the thrown error message with context about the failed annotation lookup.
|
Closing temporarily after discussion with @ovesh. Will close the issue via a different approach. |
| "Please upload it using `genome_kit.gk_data.upload_file`." | ||
| ) | ||
| self.filename = filename | ||
| super().__init__(self.message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary, but it alters the exception message printing slightly.
With call to super().__init__
genome_kit.data_manager.GKDataFileNotFoundError: hg3814123213.cfg not found!
Without
genome_kit.data_manager.GKDataFileNotFoundError: ('hg3814123213.cfg', 'hg3814123213.cfg not found!')
I'd say with is slightly more in-line with python exception printing, but it doesn't ultimately matter IMO. I can remove if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the question is why don't we just throw the IOError?
ex:
Line 100 in c073879
| GK_CHECK(result == 0, file, "Error seeking to position {} in file ({}).", offset, strerror(errno)); |
if we do want to be more fine-grained, it'd be better to update the mapping and use the existing mechanisms:
Lines 98 to 108 in c073879
| GKPY_CATCH_CASE(gk::assertion_error, PyExc_AssertionError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::file_error, PyExc_OSError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::type_error, PyExc_TypeError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::value_error, PyExc_ValueError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::index_error, PyExc_IndexError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::key_error, PyExc_KeyError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::memory_error, PyExc_MemoryError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::not_implemented_error, PyExc_NotImplementedError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::unreachable_code_error, PyExc_RuntimeError, return_stmt) \ | |
| GKPY_CATCH_CASE(gk::runtime_error, PyExc_RuntimeError, return_stmt) \ | |
| GKPY_CATCH_CASE(std::exception, PyExc_RuntimeError, return_stmt) \ |
Actually I'm having trouble understanding what's wrong with just raising FileNotFoundError and doing a rethrow for whatever we get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this response meant for this thread? I'm not seeing the connection to calling super().__init__ for the exception. If you meant it in the sense of 'why are we defining a new exception for the data manager to throw, rather than just have it throw FileNotFound?', then I suppose that could be done. I chose to define a new exception type to differentiate it from FileNotFound in that GKDataFileNotFound signifies the data manager determined the requested file does not exist. The thinking here is if the data manager raises FileNotFound due to a different error somewhere, the c++ extension module can then tell an error occurred (and not just that a file doesn't exist). I guess it's unlikely that that might happen, but I felt it was more explicit to define a type.
Let me know if I misinterpreted your response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out in what case would you call the datamanager and you need to differentiate a filenotfound via the datamanager vs a filenotfound at levels lower than the datamanager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe can you show the before and the fixed version output in the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, there is no case where a FileNotFound is raised lower than the data manager. However, since we leave open the possibility of custom data managers, it is possible some other future (3rd-party) implementation might throw a FileNotFound as a result of some internal error.
It would be completely possible to change the code to instead just work with the built-in FileNotFound exception, and as a result some of the code could be slightly simplified (mainly removing the few lines that store the pointer to the GKDataFileNotFoundException type). However, we would lose that slight granularity in determining whether a raised FileNotFound exception is a result of a (possible) data manager error, or a true file not found (data manager determined the file does not exist).
The consequence for losing that granularity is when a FileNotFound would be thrown, we are falling back to the old behaviour for this specific exception type (behaviour has been updated in PR description).
Do you feel one approach is preferable here?
| // Check for error | ||
| if (!new_path_obj) { | ||
| GK_THROW(value, "{}", traceback_format_exc_and_clear()); | ||
| if (_gk_data_file_not_found_error && PyErr_ExceptionMatches(_gk_data_file_not_found_error)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a tests for both S3 and GCS: genome that doesn't exist, missing auth (specify a non-existing bucket name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably replace this with a GK_RETHROW but not 100% sure anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't GKRETHROW here because the exception here originates from within python, which wont be caught by the rethrow catch blocks (since they're catching c++ exceptions).
| if len(eps) > 0: | ||
| logging.warning("Failed to load the data manager plugin. " | ||
| "Falling back to the default data manager.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously if a custom data manager threw an error when attempting to load, GK would silently fallback to the default DM. Instead we warn the user.
| if len_eps > 1: | ||
| logging.info("Multiple data manager plugins found. " | ||
| f"Using the first one: {list_eps[0].name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of multiple data managers, only the first one is used. This behaviour would occur silently. Now we inform the user.
|
Please update the PR description with terminal output of the old and new behavior. |
ovesh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking comment, preemptively approving.
Fixes #171
If the datamanager failed when trying to resolve the path, the stack trace would get swallowed and the code would attempt to load the non-existent
.cfgin order to parse it. This would then fail since the file doesn't exist, and print the misleading error message.Created a new Error class for cases when data manager tries to find a file and the file does not exist. C++ backend uses this error type to determine which error message to print (and stack trace in the case of data manager failure)
Old behaviour:
Invalid genome
Valid genome, but error occurs within data manager when trying to get the file
This example shows what might happen if you have no GCS credentials for the GCSDataManagerNew behaviour:
Invalid genome
Valid genome, but error occurs within data manager when trying to get the file
This example shows what might happen if you have no GCS credentials for the GCSDataManager