Skip to content

Conversation

@IanTBlack
Copy link
Collaborator

@IanTBlack IanTBlack commented Sep 5, 2025

feat: allows users to convert getScalarData json responses to a time-dimensioned xarray Dataset.

docs: Adds usage example for locationCode TWSB and SOGCS.

This commit provides some utility functions for converting getScalarData json responses to Xarray datasets, which could then be merged with other datasets or exported as netCDF files by users.

Example Use Case: https://github.com/IanTBlack/api-python-client/blob/main/examples/bcferry/twsb_sovi_thermosalinograph.ipynb

dt2str takes a python datetime object and converts it to an ISO8601 string that is compatible with the Oceans 3.0 API.

get_onc_token will grab a token that is placed in the password portion of a .netrc entry. Use of this function requires that users add that information to a .netrc file themselves.
json2xarray will take a getScalarData json response and convert it to an xarray.Dataset. This function requires that the user use metadata = 'full' and outputFormat = 'array'.

nan_onc_flags will NaN data in an xarray.Dataset variable if it is in the specified flags.

remove_onc_flag_vars will drop variables that represent ONC generate qaqcFlags.
@kan-fu
Copy link
Collaborator

kan-fu commented Sep 8, 2025

Hi Ian, thanks for your work! I will take a look. For now don't worry about the checks. They failed because they don't have permission to access the token in this repo.

@kan-fu
Copy link
Collaborator

kan-fu commented Sep 8, 2025

For the two new notebooks, do you want them to be included in https://oceannetworkscanada.github.io/api-python-client/Tutorial/index.html? Including them means they will act as a test, and be run every time there is a new PR. You might need to replace the get_onc_token() part with "YOUR_TOKEN", like the one in https://oceannetworkscanada.github.io/api-python-client/Tutorial/onc_Library_Tutorial.html.
Or we can put a link on that page that points to the location of two notebooks.

For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this?

netrc_path: os.PathLike | None
The path to the .netrc file.
If left as the default value of None, the netrc module looks for a
.netrc file in the user directory. If None, the netrc module looks for a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the second sentence is redundant.
FYI I have a plan to add env support in this library in the future, something like here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added this function since I store most of my stuff in .netrc anyway. I just did this to reduce the risk of distributing code with personal tokens. If there is a better, standard, or preferred way to store tokens then I can move to that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the netrc way is a nice feature to have. But I would suggest to rename it to get_onc_token_by_netrc to indicate that it uses netrc. Otherwise people would think about it as an officially recommended way to get token, while in fact there are other ways to get onc token (for example the env one). Also the machine could be a parameter with the default value of 'data.oceannetworks.ca' in case people want to use their own identifiers.

>>> dtstr = dt2str(datetime.now()) # doctest: +SKIP
"""

dt = pd.to_datetime(dt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding dependencies in a library is kind of a big decision. Since you are adding util methods to convert json to xarray, it makes sense to include xarray, as well as numpy for nan operations. But from my understanding pandas is only used for converting datetime (here and line 126 at xarray.py). Do you think this is better dt.replace(tzinfo=timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")?
Also the example could be improved by adding the output, like

>>> dt2str(datetime(2025,7,30,0,0,0,0))
'2025-07-30T00:00:00.000Z'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I originally added the pd.to_datetime because pandas does a decent job of converting different representations of datetime (datetime objects, Timestamps, and numpy datetime64). I did this to avoid the Attribute Error AttributeError: 'numpy.datetime64' object has no attribute 'strftime' when trying to convert numpy datetime objects to strings.

I can add an 'if isinstance' clause that will convert if the input object isn't explicitly a native datetime object so we can remove the pandas dependency, if that is the route you want to go.

However, I think a convenience function for converting from json to a pandas dataframe would also be worth while, in which case pandas would need to be a dependency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, since these functions aren't a part of the core ONC package, the required packages to use them could be added as optional dependencies to pyproject.toml. That would require moving them to a different util module though so current functionality isn't blocked. And then users could install them manually or use the optional installation option within pip e.g. pip install onc[extra]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to be neutral (unopinionated) about things that are outside the core of the client library, which is just a wrapper for making API calls, so that the client library stays small and easy to maintain (we have another MATLAB library that does the similar stuff. We tried to keep the features synced, but it is very hard). If users want to use pandas, they can install the library themselves and call pd.Dataframe(json_result) in their own code.

In your case, generating the xrarray does take some additional efforts to build, so it makes sense to add it in the util. Adding it to extra complicates the stuff, I think it is better and easier to just refactor the code to drop the dependency for pandas.

@IanTBlack
Copy link
Collaborator Author

For the two new notebooks, do you want them to be included in https://oceannetworkscanada.github.io/api-python-client/Tutorial/index.html? Including them means they will act as a test, and be run every time there is a new PR. You might need to replace the get_onc_token() part with "YOUR_TOKEN", like the one in https://oceannetworkscanada.github.io/api-python-client/Tutorial/onc_Library_Tutorial.html. Or we can put a link on that page that points to the location of two notebooks.

For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this?

I think it would be better to just have the example links mapped rather than run as tests. I can add a .md or .rst to the sphinx source.

Xarray is heavily used in the science community. It provides some great functionality for working with N-dimensional datasets.

echopype, hypercoast, and many of the OOI data examples all use xarray.

@aschlesin
Copy link
Contributor

For the xarray part, I am not familiar with that area. @aschlesin is there any DAQ member who wants to review this?

I agree with Ian, I think keeping them where he has them in the directory structure under /examples is best for users to find. The examples used in the https://oceannetworkscanada.github.io/api-python-client/ that are stored under https://github.com/OceanNetworksCanada/api-python-client/tree/main/doc/source/Code_Examples are the ones that will be always dependent on code changes.
The user examples (notebooks) can be outside of future version tests. We will add some more examples for other specific data products/ data sets in the future, such as ADCP. Many thanks Ian for providing these notebooks.

@IanTBlack
Copy link
Collaborator Author

IanTBlack commented Sep 8, 2025 via email

@IanTBlack
Copy link
Collaborator Author

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

@kan-fu
Copy link
Collaborator

kan-fu commented Sep 9, 2025

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library.

Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance.

@IanTBlack
Copy link
Collaborator Author

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library.

Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance.

I think I understand. I have some code for running additional quality assessment, merging, and performing more advanced corrections for some data products, largely based in Xarray, which I think would benefit the community and make the data a little more accessible.

Please correct me if I am wrong, but it sounds like additional changes like the above are not well suited for the api-python-client and would be better suited in their own repository, possibly as a set of community driven notebooks. If ONC would prefer, I could create a separate repository and pass ownership so that someone from ONC can maintain it as new users add their own examples. It could be called something snazzy like ONC Data Explorations or ONC Community Toolbox.

If that is the preferred route, we can close this PR and I can move the additions and your input to a new repo.

@kan-fu
Copy link
Collaborator

kan-fu commented Sep 9, 2025

@kan-fu If you are trying to maintain consistency between different programming languages for the API wrapper, would it be more appropriate to create a toolbox wrapper around the onc package, rather than to have these additions within the onc package itself?

I am not quite sure about the toolbox wrapper, but in our case the difficulty is that I only have limited knowledge on MATLAB, so I cannot help with maintaining the MATLAB library if there are bug fixes or new features added in the Python client library. Also some features and bugs are language specific (for example the xarray support), which requires someone who knows both languages to decide whether it is necessary to update the other library.
Other than the core functionalities (for example features to query a new API end point in our backend like #51 ), I think the two libraries are on their own path in terms of maintenance.

I think I understand. I have some code for running additional quality assessment, merging, and performing more advanced corrections for some data products, largely based in Xarray, which I think would benefit the community and make the data a little more accessible.

Please correct me if I am wrong, but it sounds like additional changes like the above are not well suited for the api-python-client and would be better suited in their own repository, possibly as a set of community driven notebooks. If ONC would prefer, I could create a separate repository and pass ownership so that someone from ONC can maintain it as new users add their own examples. It could be called something snazzy like ONC Data Explorations or ONC Community Toolbox.

If that is the preferred route, we can close this PR and I can move the additions and your input to a new repo.

I think your way of putting xarray helper methods in the util functions complements the core purpose of client library, and I myself welcome any changes that improve the quality of the client library. My only comment for this PR is not adding unnecessary dependency (at least it is not necessary for this PR), which could be solved by using the datetime method instead. I can merge the PR once the comments are addressed. Right now the util is not included in our API reference documentation page generated by sphinx-autoapi. It is kind of tricky to get it right. I probably will include it later myself.

Actually I was about to ask if you would like to be invited as collaborator so that you don't need to fork the repo if you want to add more features into the library. Right now the fork / PR way will fail all the checks due to the permission issue.

@IanTBlack IanTBlack closed this Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants