-
Notifications
You must be signed in to change notification settings - Fork 1
Feat/improving object download #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…arrow tables, pandas dataframes, or numpy arrays
|
Just to clarify, the file count in this PR is largely due to using find & replace to change In fact, all of the changes outside of |
BenLewis-Seequent
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
Moved to SeequentEvo#113 |
Description
Big changes are coming to
evo-objects!The idea behind this PR is to lay some foundations for more expressive interactions with Geoscience Objects, in this case focusing specifically on consuming Geoscience Object data. The bullet-point changes are:
ObjectReferencetype for structured URL referencesDownloadedObject.from_reference()constructorDownloadedObject.search()method for JMESPath queriesDownloadedObject.download_table(),DownloadedObject.download_dataframe(), andDownloadedObject.download_array()methods for downloading parquet dataKnownTableFormat.load_table()in favor of theParquetLoaderutility classObjectReference Type
A new
ObjectReferencetype has been introduced as a (de)structured URL reference to geoscience objects. This type is implemented as a subclass ofstr, ensuring full backward compatibility - it can be used anywhere an object URL string is expected without breaking existing code. TheObjectReferenceis now provided via theObjectMetadata.urlproperty (which previously returned a plain string), maintaining compatibility while adding enhanced functionality. Additionally, a static method constructor has been added to make it easy to create anObjectReferencefrom component parts, simplifying object URL construction.Example:
DownloadedObject.from_reference() Constructor
A new static method constructor
DownloadedObject.from_reference()has been added to enable simpler interactions when the Object URL is already known. This streamlines the process of working with geoscience objects by reducing the steps needed to download and interact with object data. The existingObjectAPIClienthas been refactored internally to use this new implementation for downloading geoscience objects, ensuring consistent behavior across the SDK while maintaining full backward compatibility with existing code.Example:
DownloadedObject.search() Method
The new
DownloadedObject.search()method provides powerful querying capabilities for Geoscience Object JSON data using JMESPath expressions. This allows developers to efficiently extract specific data from complex object structures without manually traversing the JSON hierarchy, making data access more intuitive and less error-prone.Example:
Data Download Methods
Three new methods have been added to
DownloadedObjectfor downloading parquet data in different formats:download_table()returns apyarrow.Table,download_dataframe()returns apandas.DataFrame, anddownload_array()returns anumpy.ndarray. These methods are optionally enabled through dependency checks, with theutilsextra dependency providing all required packages. Similar to the existingDataClient, these methods accept a dictionary resemblingTableInfoformat.These methods offer several improvements over the existing
DataClient:TableInfo-like JSON objectTableInfomust be available within the current object's JSON dataDownloadedObjectalready contains the necessary details and API connectorThis approach represents the preferred method for accessing parquet data, and the existing
DataClientimplementation will be gradually phased out through deprecation warnings before eventual removal.Example:
A new
ParquetDownloaderutility class has been introduced for downloading parquet data fromevo.common.io.Downloadinstances, with aParquetLoaderfor schema validation and loading data into required formats. These lower-level utilities serve as the foundation for theDownloadedObjectdata download methods but may also be useful in other contexts where direct parquet data loading is needed, providing flexibility for advanced use cases.Deprecation of KnownTableFormat.load_table()
The existing
KnownTableFormat.load_table()method has been marked as deprecated in favor of the newParquetLoaderimplementation. This change encourages developers to transition to the more robust and flexibleParquetLoaderfor loading parquet data, aligning with the overall improvements in data handling within the SDK. The deprecation is communicated through warnings, allowing developers time to adapt their code before the method is potentially removed in future releases.Other non-specific changes
DataType,Schema,Table, andDataFrameprotocols fromevo.objects.utilsin favour of the actual types frompyarrowandpandas.http://unittest.localhost/->https://unittest.localhost/)Checklist