-
Notifications
You must be signed in to change notification settings - Fork 24
[DRAFT] Temporal climate enhancements #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,5 +1,24 @@ | ||||||
| # New Project Guide | ||||||
|
|
||||||
| ## A Quick Hands-On Approach | ||||||
|
|
||||||
| This guide is suitable for scientists or anyone else who wants to start trying things quickly to establish their first model and make a first attempt. More detail is provided below with more detail on the nuances and alternatives for each step. | ||||||
|
|
||||||
| 1. Use [https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html](https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html) as a template for what to do. | ||||||
| 1. Determine the parameters you want to model, such as `temperature` or `wind`. When these become part of the neural network, they will be called *channels*. | ||||||
| 2. Determine the data source they come from, such as ERA5 or another model or re-analysis source | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| 3. Develop a `pipeline` which includes data normalisation | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| 4. Using a bundled model, configure that model to the size required. This may only required the adjustment of `img_size`, `in_channels` and `out_channels` to match the size of your data. The grid dimension must be a multiple of four for this model, so you may need to crop or regrid your data to match. In future, a standard approach without this limitation will be added. | ||||||
| 5. Run some number of training steps (using the `.fit` method) and visualise the outputs. Visualising predictions from the trained model every 3000 steps or so provides useful insight into the training process as well as helping see when the model might be fully trained. *There is no definite answer to how much training will be required. If your model isn't showing any progress at all after a couple of epochs, there may be a problem. Some models will start to show progress after 3000 steps.* | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| This approach should be a usable starting point for any gridded inputs and outputs. The example is based on global modelling, but could reasonably be applied to nowcasting, observational data, limited area modelling, or just anything you can represent in an xarray on a grid. You could even add a grid containins data from a weather station at each grid point and see what happens. | ||||||
|
|
||||||
| Getting a neural network to perform well and make optimal predictions is very hard, with many nuances. Getting started should be reasonably simple. | ||||||
|
|
||||||
| The sections below go into more detail on how to treat source data, how to develop the most suitable pipeline for your project, how to use alternative neural network architectures, how to manage the training process, and how to perform a more thorough evaluation of the outputs. | ||||||
|
|
||||||
| ## Metholodogical Information | ||||||
|
|
||||||
| This guide offers a simple, repeatable process for undertaking a machine learning project. Experts in machine learning will recognise this as a standard approach, but of course it can be adapted as required in the project. Completing a project (whether using PyEarthTools or not) comprises the following steps: | ||||||
|
|
||||||
| 1. Identify the sources of data that you wish to work with | ||||||
|
|
||||||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,8 @@ | |
| from pathlib import Path | ||
| from typing import Any, Callable, Iterable, Literal, Optional | ||
|
|
||
| import pandas as pd | ||
| import cftime | ||
| import xarray as xr | ||
|
|
||
| import pyearthtools.data | ||
|
|
@@ -482,15 +484,26 @@ def retrieve( | |
| if time_dim not in data.dims and time_dim in data.coords: | ||
| data = data.expand_dims(time_dim) | ||
|
|
||
| time_query = str(Petdt(querytime)) | ||
| if isinstance(data.coords[time_dim].values[0], cftime.datetime): | ||
| time_query = cftime.datetime(querytime.year, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [minor] I'm pretty bad at naming, however this is more of a quality of life/functional change. It likely is something that needs to be explored throughout the codebase, rather than a suggested change in this PR - so apologies for singling this one (especially since it might be a wip). I think it may be easier for search and general ease of dev, if the manipulated variable and the original variable share a reasonably similar substring. E.g.
|
||
| querytime.month, | ||
| querytime.day, | ||
| calendar='noleap', | ||
| has_year_zero=True) | ||
| self._round = True | ||
| round = True | ||
| # time_query = pd.to_datetime(time_query) | ||
|
|
||
| if select and time_dim in data: | ||
| try: | ||
| data = data.sel( | ||
| **{time_dim: str(Petdt(querytime))}, | ||
| **{time_dim: time_query}, | ||
| method="nearest" if round else None, | ||
| ) | ||
| except KeyError: | ||
| warnings.warn( | ||
| f"Could not find time in dataset to select on. {querytime!r}", | ||
| f"Could not find time in dataset to select on. {time_query!r}", | ||
| IndexWarning, | ||
| ) | ||
|
|
||
|
|
@@ -535,14 +548,14 @@ def series( | |
| Loaded series of data | ||
| """ | ||
|
|
||
| interval = self._get_interval(interval) | ||
| _interval = self._get_interval(interval) | ||
| tolerance = kwargs.pop("tolerance", getattr(self, "data_interval", None)) | ||
|
|
||
| return index_routines.series( | ||
| self, | ||
| start, | ||
| end, | ||
| interval, | ||
| _interval, | ||
| transforms=transforms, | ||
| tolerance=tolerance, | ||
| **kwargs, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # PyEarthTools Models | ||
|
|
||
| This is the modles sub-package which forms a part of the [PyEarthTools package](https://github.com/ACCESS-Community-Hub/PyEarthTools). | ||
|
|
||
| Documentation for the PyEarthTools package is available [here](https://pyearthtools.readthedocs.io/en/latest/). |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| [build-system] | ||
| requires = ["hatchling"] | ||
| build-backend = "hatchling.build" | ||
|
|
||
|
|
||
| [project] | ||
| name = "pyearthtools-models" | ||
|
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't relevant to this PR and was an accidental include |
||
| description = "Blueprint model implementations which can be used in PyEarthTools" | ||
| requires-python = ">=3.9" | ||
| keywords = ["pyearthtools"] | ||
| maintainers = [ | ||
| {name = "Tennessee Leeuwenburg", email = "tennessee.leeuwenburg@bom.gov.au"} | ||
| ] | ||
| classifiers = [ | ||
| "Programming Language :: Python :: 3", | ||
| "License :: OSI Approved :: Apache Software License", | ||
| "Operating System :: OS Independent", | ||
| ] | ||
| dependencies = [ | ||
| "xarray[complete]", | ||
| "geopandas", | ||
| "shapely", | ||
| "tqdm", | ||
| "pyyaml", | ||
| ] | ||
| dynamic = ["version", "readme"] | ||
|
|
||
| [tool.setuptools.dynamic] | ||
| readme = {file = ["README.md"], content-type = "text/markdown"} | ||
|
|
||
| all = [ | ||
| "pyearthtools-models", | ||
| ] | ||
|
|
||
| [project.urls] | ||
| homepage = "https://pyearthtools.readthedocs.io/" | ||
| documentation = "https://pyearthtools.readthedocs.io/" | ||
| repository = "https://github.com/ACCESS-Community-Hub/PyEarthTools" | ||
|
|
||
| [tool.isort] | ||
| profile = "black" | ||
|
|
||
| [tool.black] | ||
| line-length = 120 | ||
|
|
||
| [tool.ruff] | ||
| line-length = 120 | ||
|
|
||
| [tool.mypy] | ||
| warn_return_any = true | ||
| warn_unused_configs = true | ||
|
|
||
| [tool.hatch.version] | ||
| path = "src/pyearthtools/models/__init__.py" | ||
|
|
||
| [tool.hatch.build.targets.wheel] | ||
| packages = ["src/pyearthtools/"] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -163,8 +163,13 @@ def __init__( | |
|
|
||
| warnings.warn(UNDER_DEV_MSG) | ||
|
|
||
| base_transforms = TransformCollection() | ||
| base_transforms += pyearthtools.data.transforms.variables.variable_trim(variables) | ||
| base_transforms += pyearthtools.data.transforms.coordinates.Drop(coordinates='height') | ||
|
|
||
| super().__init__( | ||
| transforms=TransformCollection(), | ||
| transforms=base_transforms, | ||
| data_interval= (1, "month") | ||
| ) | ||
|
|
||
| self.record_initialisation() | ||
|
|
@@ -208,6 +213,9 @@ def quick_walk(self): | |
|
|
||
| self.walk_cache = walk_cache | ||
|
|
||
|
|
||
|
|
||
|
|
||
| def filesystem(self, query_dictionary={}): | ||
| """ | ||
| Given the supplied query, return all filenames which contain the data necessary to extract the data | ||
|
|
@@ -248,7 +256,7 @@ def match_path(self, path, query): | |
| if path["model"] not in self.models: | ||
| match = False | ||
|
|
||
| if path["interval"] not in self.interval: | ||
| if path["interval"] not in self.interval[0]: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should there be a |
||
| match = False | ||
|
|
||
| if path["scenario"] not in self.scenarios: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.