-
Notifications
You must be signed in to change notification settings - Fork 10
Some Xarray conversion support and an example. #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dt2str takes a python datetime object and converts it to an ISO8601 string that is compatible with the Oceans 3.0 API. get_onc_token will grab a token that is placed in the password portion of a .netrc entry. Use of this function requires that users add that information to a .netrc file themselves.
json2xarray will take a getScalarData json response and convert it to an xarray.Dataset. This function requires that the user use metadata = 'full' and outputFormat = 'array'. nan_onc_flags will NaN data in an xarray.Dataset variable if it is in the specified flags. remove_onc_flag_vars will drop variables that represent ONC generate qaqcFlags.
|
Hi Ian, thanks for your work! I will take a look. For now don't worry about the checks. They failed because they don't have permission to access the token in this repo. |
|
For the two new notebooks, do you want them to be included in https://oceannetworkscanada.github.io/api-python-client/Tutorial/index.html? Including them means they will act as a test, and be run every time there is a new PR. You might need to replace the get_onc_token() part with "YOUR_TOKEN", like the one in https://oceannetworkscanada.github.io/api-python-client/Tutorial/onc_Library_Tutorial.html. For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this? |
| netrc_path: os.PathLike | None | ||
| The path to the .netrc file. | ||
| If left as the default value of None, the netrc module looks for a | ||
| .netrc file in the user directory. If None, the netrc module looks for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the second sentence is redundant.
FYI I have a plan to add env support in this library in the future, something like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added this function since I store most of my stuff in .netrc anyway. I just did this to reduce the risk of distributing code with personal tokens. If there is a better, standard, or preferred way to store tokens then I can move to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the netrc way is a nice feature to have. But I would suggest to rename it to get_onc_token_by_netrc to indicate that it uses netrc. Otherwise people would think about it as an officially recommended way to get token, while in fact there are other ways to get onc token (for example the env one). Also the machine could be a parameter with the default value of 'data.oceannetworks.ca' in case people want to use their own identifiers.
| >>> dtstr = dt2str(datetime.now()) # doctest: +SKIP | ||
| """ | ||
|
|
||
| dt = pd.to_datetime(dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding dependencies in a library is kind of a big decision. Since you are adding util methods to convert json to xarray, it makes sense to include xarray, as well as numpy for nan operations. But from my understanding pandas is only used for converting datetime (here and line 126 at xarray.py). Do you think this is better dt.replace(tzinfo=timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")?
Also the example could be improved by adding the output, like
>>> dt2str(datetime(2025,7,30,0,0,0,0))
'2025-07-30T00:00:00.000Z'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I originally added the pd.to_datetime because pandas does a decent job of converting different representations of datetime (datetime objects, Timestamps, and numpy datetime64). I did this to avoid the Attribute Error AttributeError: 'numpy.datetime64' object has no attribute 'strftime' when trying to convert numpy datetime objects to strings.
I can add an 'if isinstance' clause that will convert if the input object isn't explicitly a native datetime object so we can remove the pandas dependency, if that is the route you want to go.
However, I think a convenience function for converting from json to a pandas dataframe would also be worth while, in which case pandas would need to be a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, since these functions aren't a part of the core ONC package, the required packages to use them could be added as optional dependencies to pyproject.toml. That would require moving them to a different util module though so current functionality isn't blocked. And then users could install them manually or use the optional installation option within pip e.g. pip install onc[extra]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to be neutral (unopinionated) about things that are outside the core of the client library, which is just a wrapper for making API calls, so that the client library stays small and easy to maintain (we have another MATLAB library that does the similar stuff. We tried to keep the features synced, but it is very hard). If users want to use pandas, they can install the library themselves and call pd.Dataframe(json_result) in their own code.
In your case, generating the xrarray does take some additional efforts to build, so it makes sense to add it in the util. Adding it to extra complicates the stuff, I think it is better and easier to just refactor the code to drop the dependency for pandas.
I think it would be better to just have the example links mapped rather than run as tests. I can add a .md or .rst to the sphinx source. Xarray is heavily used in the science community. It provides some great functionality for working with N-dimensional datasets. echopype, hypercoast, and many of the OOI data examples all use xarray. |
I agree with Ian, I think keeping them where he has them in the directory structure under /examples is best for users to find. The examples used in the https://oceannetworkscanada.github.io/api-python-client/ that are stored under https://github.com/OceanNetworksCanada/api-python-client/tree/main/doc/source/Code_Examples are the ones that will be always dependent on code changes. |
|
Easy enough. I'll work on making those changes.
________________________________
From: Kan Fu ***@***.***>
Sent: Monday, September 8, 2025 2:33:01 PM
To: OceanNetworksCanada/api-python-client ***@***.***>
Cc: Black, Ian ***@***.***>; Author ***@***.***>
Subject: Re: [OceanNetworksCanada/api-python-client] Some Xarray conversion support and an example. (PR #65)
[This email originated from outside of OSU. Use caution with links and attachments.]
@kan-fu commented on this pull request.
________________________________
In src/onc/util/util.py<#65 (comment)>:
+
+
+def get_onc_token(netrc_path: os.PathLike | None = None) -> str:
+ """
+ Retrieve an ONC token from the password portion of a .netrc file entry.
+
+ machine data.oceannetworks.ca
+ login <username>
+ password <onc_token>
+
+ Parameters
+ ----------
+ netrc_path: os.PathLike | None
+ The path to the .netrc file.
+ If left as the default value of None, the netrc module looks for a
+ .netrc file in the user directory. If None, the netrc module looks for a
I think the netrc way is a nice feature to have. But I would suggest to rename it to get_onc_token_by_netrc to indicate that it uses netrc. Otherwise people would think about it as an officially recommended way to get token, while in fact there are other ways to get onc token (for example the env one). Also the machine could be a parameter with the default value of 'data.oceannetworks.ca' in case people want to use their own identifiers.
________________________________
In src/onc/util/util.py<#65 (comment)>:
+
+ Parameters
+ ----------
+ dt: datetime
+ A Python datetime object.
+
+ Returns
+ -------
+ str
+
+ Examples
+ ----------
+ >>> dtstr = dt2str(datetime.now()) # doctest: +SKIP
+ """
+
+ dt = pd.to_datetime(dt)
I tried to be neutral (unopinionated) about things that are outside the core of the client library, which is just a wrapper for making API calls, so that the client library stays small and easy to maintain (we have another MATLAB library that does the similar stuff. We tried to keep the features synced, but it is very hard). If users want to use pandas, they can install the library themselves and call pd.Dataframe(json_result) in their own code.
In your case, generating the xrarray does take some additional efforts to build, so it makes sense to add it in the util. Adding it to extra complicates the stuff, I think it is better and easier to just refactor the code to drop the dependency for pandas.
—
Reply to this email directly, view it on GitHub<#65 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGU4ACMC5DJ74HHZPVGRHET3RXYY3AVCNFSM6AAAAACFYDTQHSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCOJYGE3TKNJVGQ>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
|
@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself? |
I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library. Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance. |
I think I understand. I have some code for running additional quality assessment, merging, and performing more advanced corrections for some data products, largely based in Xarray, which I think would benefit the community and make the data a little more accessible. Please correct me if I am wrong, but it sounds like additional changes like the above are not well suited for the api-python-client and would be better suited in their own repository, possibly as a set of community driven notebooks. If ONC would prefer, I could create a separate repository and pass ownership so that someone from ONC can maintain it as new users add their own examples. It could be called something snazzy like ONC Data Explorations or ONC Community Toolbox. If that is the preferred route, we can close this PR and I can move the additions and your input to a new repo. |
I think your way of putting xarray helper methods in the util functions complements the core purpose of client library, and I myself welcome any changes that improve the quality of the client library. My only comment for this PR is not adding unnecessary dependency (at least it is not necessary for this PR), which could be solved by using the datetime method instead. I can merge the PR once the comments are addressed. Right now the util is not included in our API reference documentation page generated by sphinx-autoapi. It is kind of tricky to get it right. I probably will include it later myself. Actually I was about to ask if you would like to be invited as collaborator so that you don't need to fork the repo if you want to add more features into the library. Right now the fork / PR way will fail all the checks due to the permission issue. |
feat: allows users to convert getScalarData json responses to a time-dimensioned xarray Dataset.
docs: Adds usage example for locationCode TWSB and SOGCS.
This commit provides some utility functions for converting getScalarData json responses to Xarray datasets, which could then be merged with other datasets or exported as netCDF files by users.
Example Use Case: https://github.com/IanTBlack/api-python-client/blob/main/examples/bcferry/twsb_sovi_thermosalinograph.ipynb