Some Xarray conversion support and an example. #65

IanTBlack · 2025-09-05T19:49:59Z

feat: allows users to convert getScalarData json responses to a time-dimensioned xarray Dataset.

docs: Adds usage example for locationCode TWSB and SOGCS.

This commit provides some utility functions for converting getScalarData json responses to Xarray datasets, which could then be merged with other datasets or exported as netCDF files by users.

Example Use Case: https://github.com/IanTBlack/api-python-client/blob/main/examples/bcferry/twsb_sovi_thermosalinograph.ipynb

dt2str takes a python datetime object and converts it to an ISO8601 string that is compatible with the Oceans 3.0 API. get_onc_token will grab a token that is placed in the password portion of a .netrc entry. Use of this function requires that users add that information to a .netrc file themselves.

json2xarray will take a getScalarData json response and convert it to an xarray.Dataset. This function requires that the user use metadata = 'full' and outputFormat = 'array'. nan_onc_flags will NaN data in an xarray.Dataset variable if it is in the specified flags. remove_onc_flag_vars will drop variables that represent ONC generate qaqcFlags.

kan-fu · 2025-09-08T17:34:59Z

Hi Ian, thanks for your work! I will take a look. For now don't worry about the checks. They failed because they don't have permission to access the token in this repo.

kan-fu · 2025-09-08T18:54:14Z

For the two new notebooks, do you want them to be included in https://oceannetworkscanada.github.io/api-python-client/Tutorial/index.html? Including them means they will act as a test, and be run every time there is a new PR. You might need to replace the get_onc_token() part with "YOUR_TOKEN", like the one in https://oceannetworkscanada.github.io/api-python-client/Tutorial/onc_Library_Tutorial.html.
Or we can put a link on that page that points to the location of two notebooks.

For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this?

kan-fu · 2025-09-08T18:19:01Z

src/onc/util/util.py

+    netrc_path: os.PathLike | None
+        The path to the .netrc file.
+        If left as the default value of None, the netrc module looks for a
+        .netrc file in the user directory. If None, the netrc module looks for a


Looks like the second sentence is redundant.
FYI I have a plan to add env support in this library in the future, something like here.

Thanks. I added this function since I store most of my stuff in .netrc anyway. I just did this to reduce the risk of distributing code with personal tokens. If there is a better, standard, or preferred way to store tokens then I can move to that.

I think the netrc way is a nice feature to have. But I would suggest to rename it to get_onc_token_by_netrc to indicate that it uses netrc. Otherwise people would think about it as an officially recommended way to get token, while in fact there are other ways to get onc token (for example the env one). Also the machine could be a parameter with the default value of 'data.oceannetworks.ca' in case people want to use their own identifiers.

kan-fu · 2025-09-08T18:40:20Z

src/onc/util/util.py

+    >>> dtstr = dt2str(datetime.now()) # doctest: +SKIP
+    """
+
+    dt = pd.to_datetime(dt)


Adding dependencies in a library is kind of a big decision. Since you are adding util methods to convert json to xarray, it makes sense to include xarray, as well as numpy for nan operations. But from my understanding pandas is only used for converting datetime (here and line 126 at xarray.py). Do you think this is better dt.replace(tzinfo=timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")?
Also the example could be improved by adding the output, like

>>> dt2str(datetime(2025,7,30,0,0,0,0)) '2025-07-30T00:00:00.000Z'

I think I originally added the pd.to_datetime because pandas does a decent job of converting different representations of datetime (datetime objects, Timestamps, and numpy datetime64). I did this to avoid the Attribute Error AttributeError: 'numpy.datetime64' object has no attribute 'strftime' when trying to convert numpy datetime objects to strings.

I can add an 'if isinstance' clause that will convert if the input object isn't explicitly a native datetime object so we can remove the pandas dependency, if that is the route you want to go.

However, I think a convenience function for converting from json to a pandas dataframe would also be worth while, in which case pandas would need to be a dependency.

Alternatively, since these functions aren't a part of the core ONC package, the required packages to use them could be added as optional dependencies to pyproject.toml. That would require moving them to a different util module though so current functionality isn't blocked. And then users could install them manually or use the optional installation option within pip e.g. pip install onc[extra]

I tried to be neutral (unopinionated) about things that are outside the core of the client library, which is just a wrapper for making API calls, so that the client library stays small and easy to maintain (we have another MATLAB library that does the similar stuff. We tried to keep the features synced, but it is very hard). If users want to use pandas, they can install the library themselves and call pd.Dataframe(json_result) in their own code.

In your case, generating the xrarray does take some additional efforts to build, so it makes sense to add it in the util. Adding it to extra complicates the stuff, I think it is better and easier to just refactor the code to drop the dependency for pandas.

IanTBlack · 2025-09-08T19:50:06Z

For the two new notebooks, do you want them to be included in https://oceannetworkscanada.github.io/api-python-client/Tutorial/index.html? Including them means they will act as a test, and be run every time there is a new PR. You might need to replace the get_onc_token() part with "YOUR_TOKEN", like the one in https://oceannetworkscanada.github.io/api-python-client/Tutorial/onc_Library_Tutorial.html. Or we can put a link on that page that points to the location of two notebooks.

For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this?

I think it would be better to just have the example links mapped rather than run as tests. I can add a .md or .rst to the sphinx source.

Xarray is heavily used in the science community. It provides some great functionality for working with N-dimensional datasets.

echopype, hypercoast, and many of the OOI data examples all use xarray.

aschlesin · 2025-09-08T19:57:35Z

For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this?

I agree with Ian, I think keeping them where he has them in the directory structure under /examples is best for users to find. The examples used in the https://oceannetworkscanada.github.io/api-python-client/ that are stored under https://github.com/OceanNetworksCanada/api-python-client/tree/main/doc/source/Code_Examples are the ones that will be always dependent on code changes.
The user examples (notebooks) can be outside of future version tests. We will add some more examples for other specific data products/ data sets in the future, such as ADCP. Many thanks Ian for providing these notebooks.

IanTBlack · 2025-09-08T21:35:10Z

Easy enough. I'll work on making those changes.

________________________________ From: Kan Fu ***@***.***> Sent: Monday, September 8, 2025 2:33:01 PM To: OceanNetworksCanada/api-python-client ***@***.***> Cc: Black, Ian ***@***.***>; Author ***@***.***> Subject: Re: [OceanNetworksCanada/api-python-client] Some Xarray conversion support and an example. (PR #65) [This email originated from outside of OSU. Use caution with links and attachments.] @kan-fu commented on this pull request.

________________________________ In src/onc/util/util.py<#65 (comment)>:

+

+ +def get_onc_token(netrc_path: os.PathLike | None = None) -> str: + """ + Retrieve an ONC token from the password portion of a .netrc file entry. + + machine data.oceannetworks.ca + login <username> + password <onc_token> + + Parameters + ---------- + netrc_path: os.PathLike | None + The path to the .netrc file. + If left as the default value of None, the netrc module looks for a + .netrc file in the user directory. If None, the netrc module looks for a I think the netrc way is a nice feature to have. But I would suggest to rename it to get_onc_token_by_netrc to indicate that it uses netrc. Otherwise people would think about it as an officially recommended way to get token, while in fact there are other ways to get onc token (for example the env one). Also the machine could be a parameter with the default value of 'data.oceannetworks.ca' in case people want to use their own identifiers.

________________________________ In src/onc/util/util.py<#65 (comment)>:

+

+ Parameters + ---------- + dt: datetime + A Python datetime object. + + Returns + ------- + str + + Examples + ---------- + >>> dtstr = dt2str(datetime.now()) # doctest: +SKIP + """ + + dt = pd.to_datetime(dt) I tried to be neutral (unopinionated) about things that are outside the core of the client library, which is just a wrapper for making API calls, so that the client library stays small and easy to maintain (we have another MATLAB library that does the similar stuff. We tried to keep the features synced, but it is very hard). If users want to use pandas, they can install the library themselves and call pd.Dataframe(json_result) in their own code. In your case, generating the xrarray does take some additional efforts to build, so it makes sense to add it in the util. Adding it to extra complicates the stuff, I think it is better and easier to just refactor the code to drop the dependency for pandas. — Reply to this email directly, view it on GitHub<#65 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGU4ACMC5DJ74HHZPVGRHET3RXYY3AVCNFSM6AAAAACFYDTQHSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCOJYGE3TKNJVGQ>. You are receiving this because you authored the thread.Message ID: ***@***.***>

IanTBlack · 2025-09-08T23:59:31Z

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

kan-fu · 2025-09-09T17:04:31Z

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library.

Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance.

IanTBlack · 2025-09-09T20:49:32Z

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library.

Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance.

I think I understand. I have some code for running additional quality assessment, merging, and performing more advanced corrections for some data products, largely based in Xarray, which I think would benefit the community and make the data a little more accessible.

Please correct me if I am wrong, but it sounds like additional changes like the above are not well suited for the api-python-client and would be better suited in their own repository, possibly as a set of community driven notebooks. If ONC would prefer, I could create a separate repository and pass ownership so that someone from ONC can maintain it as new users add their own examples. It could be called something snazzy like ONC Data Explorations or ONC Community Toolbox.

If that is the preferred route, we can close this PR and I can move the additions and your input to a new repo.

kan-fu · 2025-09-09T21:10:09Z

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library.
Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance.

I think I understand. I have some code for running additional quality assessment, merging, and performing more advanced corrections for some data products, largely based in Xarray, which I think would benefit the community and make the data a little more accessible.

Please correct me if I am wrong, but it sounds like additional changes like the above are not well suited for the api-python-client and would be better suited in their own repository, possibly as a set of community driven notebooks. If ONC would prefer, I could create a separate repository and pass ownership so that someone from ONC can maintain it as new users add their own examples. It could be called something snazzy like ONC Data Explorations or ONC Community Toolbox.

If that is the preferred route, we can close this PR and I can move the additions and your input to a new repo.

I think your way of putting xarray helper methods in the util functions complements the core purpose of client library, and I myself welcome any changes that improve the quality of the client library. My only comment for this PR is not adding unnecessary dependency (at least it is not necessary for this PR), which could be solved by using the datetime method instead. I can merge the PR once the comments are addressed. Right now the util is not included in our API reference documentation page generated by sphinx-autoapi. It is kind of tricky to get it right. I probably will include it later myself.

Actually I was about to ask if you would like to be invited as collaborator so that you don't need to fork the repo if you want to add more features into the library. Right now the fork / PR way will fail all the checks due to the permission issue.

IanTBlack added 6 commits September 5, 2025 12:19

Add example for accessing TWSB data.

5d4ef85

Fixed for ruff.

9d5686f

Added dependencies.

5dc35e7

Added HF Radar Data Product Example

a7cdc6b

kan-fu reviewed Sep 8, 2025

View reviewed changes

IanTBlack closed this Sep 22, 2025

Some Xarray conversion support and an example. #65

Some Xarray conversion support and an example. #65

Uh oh!

Conversation

IanTBlack commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kan-fu commented Sep 8, 2025

Uh oh!

kan-fu commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kan-fu Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

IanTBlack Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

kan-fu Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

kan-fu Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

IanTBlack Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

IanTBlack Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

kan-fu Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

IanTBlack commented Sep 8, 2025

Uh oh!

aschlesin commented Sep 8, 2025

Uh oh!

IanTBlack commented Sep 8, 2025 via email

Uh oh!

IanTBlack commented Sep 8, 2025

Uh oh!

kan-fu commented Sep 9, 2025

Uh oh!

IanTBlack commented Sep 9, 2025

Uh oh!

kan-fu commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IanTBlack commented Sep 5, 2025 •

edited

Loading

kan-fu commented Sep 8, 2025 •

edited

Loading