Skip to content

Conversation

@wordsworthc
Copy link
Owner

Description

Big changes are coming to evo-objects!

The idea behind this PR is to lay some foundations for more expressive interactions with Geoscience Objects, in this case focusing specifically on consuming Geoscience Object data. The bullet-point changes are:

  • Add ObjectReference type for structured URL references
  • Add DownloadedObject.from_reference() constructor
  • Add DownloadedObject.search() method for JMESPath queries
  • Add DownloadedObject.download_table(), DownloadedObject.download_dataframe(), and DownloadedObject.download_array() methods for downloading parquet data
  • Deprecate KnownTableFormat.load_table() in favor of the ParquetLoader utility class

ObjectReference Type

A new ObjectReference type has been introduced as a (de)structured URL reference to geoscience objects. This type is implemented as a subclass of str, ensuring full backward compatibility - it can be used anywhere an object URL string is expected without breaking existing code. The ObjectReference is now provided via the ObjectMetadata.url property (which previously returned a plain string), maintaining compatibility while adding enhanced functionality. Additionally, a static method constructor has been added to make it easy to create an ObjectReference from component parts, simplifying object URL construction.

Example:

obj_ref = ObjectReference.new(
    environment=environment,  # an existing Environment instance
    object_path="path/to/object.json",  # or object_id=UUID("<object-id>")
    version_id="<version-id>",  # optional
)

DownloadedObject.from_reference() Constructor

A new static method constructor DownloadedObject.from_reference() has been added to enable simpler interactions when the Object URL is already known. This streamlines the process of working with geoscience objects by reducing the steps needed to download and interact with object data. The existing ObjectAPIClient has been refactored internally to use this new implementation for downloading geoscience objects, ensuring consistent behavior across the SDK while maintaining full backward compatibility with existing code.

Example:

downloaded_object = await DownloadedObject.from_reference(
    connector=connector,  # an existing APIConnector instance
    object_reference="<geoscience-object-url>",  # or an ObjectReference instance
    cache=cache,  # an existing ICache instance, optional
)

DownloadedObject.search() Method

The new DownloadedObject.search() method provides powerful querying capabilities for Geoscience Object JSON data using JMESPath expressions. This allows developers to efficiently extract specific data from complex object structures without manually traversing the JSON hierarchy, making data access more intuitive and less error-prone.

Example:

# Use a JMESPath expression to query the object data
result = downloaded_object.search("locations.coordinates")

# JMESPath expressions can also be used to filter and transform data
filtered_result = downloaded_object.search("locations.attributes[?attribute_type=='scalar']")
transformed_result = downloaded_object.search("locations.attributes[].{key: key || name, name: name}")

Data Download Methods

Three new methods have been added to DownloadedObject for downloading parquet data in different formats: download_table() returns a pyarrow.Table, download_dataframe() returns a pandas.DataFrame, and download_array() returns a numpy.ndarray. These methods are optionally enabled through dependency checks, with the utils extra dependency providing all required packages. Similar to the existing DataClient, these methods accept a dictionary resembling TableInfo format.

These methods offer several improvements over the existing DataClient:

  • JMESPath Support: Instead of requiring dictionary input, you can provide a JMESPath expression that resolves to a TableInfo-like JSON object
  • Self-contained Data Access: The data ID referenced by TableInfo must be available within the current object's JSON data
  • Simplified Interface: No additional identifiers are needed since DownloadedObject already contains the necessary details and API connector

This approach represents the preferred method for accessing parquet data, and the existing DataClient implementation will be gradually phased out through deprecation warnings before eventual removal.

Example:

# Using a dictionary
table_info = {
  "data": "<data-id>",
  "length": 1234,
  "width": 3,
  "data_type": "float64"
}
table = await downloaded_object.download_table(table_info)
df = await downloaded_object.download_dataframe(table_info)
array = await downloaded_object.download_array(table_info)

# Using a JMESPath expression
table = await downloaded_object.download_table("locations.coordinates")
df = await downloaded_object.download_dataframe("locations.coordinates")
array = await downloaded_object.download_array("locations.coordinates")

A new ParquetDownloader utility class has been introduced for downloading parquet data from evo.common.io.Download instances, with a ParquetLoader for schema validation and loading data into required formats. These lower-level utilities serve as the foundation for the DownloadedObject data download methods but may also be useful in other contexts where direct parquet data loading is needed, providing flexibility for advanced use cases.

Deprecation of KnownTableFormat.load_table()

The existing KnownTableFormat.load_table() method has been marked as deprecated in favor of the new ParquetLoader implementation. This change encourages developers to transition to the more robust and flexible ParquetLoader for loading parquet data, aligning with the overall improvements in data handling within the SDK. The deprecation is communicated through warnings, allowing developers time to adapt their code before the method is potentially removed in future releases.

Other non-specific changes

  • removed DataType, Schema, Table, and DataFrame protocols from evo.objects.utils in favour of the actual types from pyarrow and pandas.
  • refactored parsing API responses for improved composition and re-use
  • Always use HTTPS URLs in test data (http://unittest.localhost/ -> https://unittest.localhost/)

Checklist

  • I have read the contributing guide and the code of conduct

…arrow tables, pandas dataframes, or numpy arrays
@wordsworthc
Copy link
Owner Author

Just to clarify, the file count in this PR is largely due to using find & replace to change http://unittest.localhost/ to https://unittest.localhost/. most of the .json files touched in this PR are for that change alone, and can be safely skimmed over. The only exception is packages/evo-objects/tests/data/get_object_detailed.json, which was added for TestDownloadedObject.

In fact, all of the changes outside of evo-objects are for this reason alone.

@wordsworthc wordsworthc changed the base branch from feat/jmespath-support to main October 8, 2025 02:34
Copy link

@BenLewis-Seequent BenLewis-Seequent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@wordsworthc
Copy link
Owner Author

Moved to SeequentEvo#113

@wordsworthc wordsworthc closed this Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants