[DRAFT] Temporal climate enhancements #129

tennlee · 2025-06-07T11:40:26Z

This is work in progress to advance data pipelines which help to translate between weather timescales and climate timescales. Climate timescales are seemingly modelling typically at monthy duration, using a 360-day calendar which is new for me. ERA5 is a re-analysis product available (at high resolutions) at hourly resolution. The challenge is to work out how to relate these data sets and capture that in a pipeline. The existing tutorial on working with climate data approaches this but provides and incomplete solution with respect to presenting samples to an ML pipeline.

The work thus far enhances some of the pipeline code needed to do this. However, we need to add an aggregation process to the temporal retrieval which is used when accessing the ERA5 data.

I am putting the work in progress up as a pull request and will pose the question to some of the other collaborators as to how best to progress from here.

…gregation

tennlee · 2025-06-07T11:41:24Z

packages/models/pyproject.toml

+
+
+[project]
+name = "pyearthtools-models"


This isn't relevant to this PR and was an accidental include

tennlee · 2025-06-07T11:51:47Z

@jennan @millerjoel I'd be interested in your take on this one, it's not clear exactly what to do next. I might take another run at it tomorrow. Do you have some CMIP5 data on hand to set up the same data archive and look at the problem together?

nikeethr · 2026-01-06T02:35:57Z

packages/data/src/pyearthtools/data/indexes/indexes.py


+        time_query = str(Petdt(querytime))
+        if isinstance(data.coords[time_dim].values[0], cftime.datetime):
+            time_query = cftime.datetime(querytime.year, 


[minor] I'm pretty bad at naming, however this is more of a quality of life/functional change. It likely is something that needs to be explored throughout the codebase, rather than a suggested change in this PR - so apologies for singling this one (especially since it might be a wip).

I think it may be easier for search and general ease of dev, if the manipulated variable and the original variable share a reasonably similar substring. E.g. pet_querytime or cf_querytime or str_querytime etc.

querytime itself can be changed to time_query - no issue with that, just as long as sufficiently similar in both versions of the variable.

nikeethr · 2026-01-06T02:42:27Z

packages/pipeline/src/pyearthtools/pipeline/operations/xarray/join.py

+
+        # We need to make interp_like ignore the time dimension
+        # TODO - work out if we want some options here to specify which dims to preserve
+        if 'time' in self.reference_dataset.coords:


Just dropping here that, there are some faster interp implementations I have in mind (with the benefit of also being more accurate). Happy to discuss.

nikeethr · 2026-01-06T02:48:02Z

docs/newproject.md


+## A Quick Hands-On Approach
+
+This guide is suitable for scientists or anyone else who wants to start trying things quickly to establish their first model and make a first attempt. More detail is provided below with more detail on the nuances and alternatives for each step.


Suggested change

This guide is suitable for scientists or anyone else who wants to start trying things quickly to establish their first model and make a first attempt. More detail is provided below with more detail on the nuances and alternatives for each step.

This guide is suitable for scientists or anyone else who wants to quickly establish their first model, hands-on. The following describes the nuances and alternatives for each step, in detail.

nikeethr · 2026-01-06T02:49:45Z

docs/newproject.md

+
+1. Use [https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html](https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html) as a template for what to do.
+1. Determine the parameters you want to model, such as `temperature` or `wind`. When these become part of the neural network, they will be called *channels*.
+2. Determine the data source they come from, such as ERA5 or another model or re-analysis source


Suggested change

2. Determine the data source they come from, such as ERA5 or another model or re-analysis source

2. Determine the data source they come from, such as ERA5 or another model or re-analysis source.

nikeethr · 2026-01-06T02:49:55Z

docs/newproject.md

+1. Use [https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html](https://pyearthtools.readthedocs.io/en/latest/notebooks/tutorial/FourCastMini_Demo.html) as a template for what to do.
+1. Determine the parameters you want to model, such as `temperature` or `wind`. When these become part of the neural network, they will be called *channels*.
+2. Determine the data source they come from, such as ERA5 or another model or re-analysis source
+3. Develop a `pipeline` which includes data normalisation


Suggested change

3. Develop a `pipeline` which includes data normalisation

3. Develop a `pipeline` which includes data normalisation.

nikeethr · 2026-01-06T02:52:17Z

docs/newproject.md

+2. Determine the data source they come from, such as ERA5 or another model or re-analysis source
+3. Develop a `pipeline` which includes data normalisation
+4. Using a bundled model, configure that model to the size required. This may only required the adjustment of `img_size`, `in_channels` and `out_channels` to match the size of your data. The grid dimension must be a multiple of four for this model, so you may need to crop or regrid your data to match. In future, a standard approach without this limitation will be added. 
+5. Run some number of training steps (using the `.fit` method) and visualise the outputs. Visualising predictions from the trained model every 3000 steps or so provides useful insight into the training process as well as helping see when the model might be fully trained. *There is no definite answer to how much training will be required. If your model isn't showing any progress at all after a couple of epochs, there may be a problem. Some models will start to show progress after 3000 steps.*


Suggested change

5. Run some number of training steps (using the `.fit` method) and visualise the outputs. Visualising predictions from the trained model every 3000 steps or so provides useful insight into the training process as well as helping see when the model might be fully trained. *There is no definite answer to how much training will be required. If your model isn't showing any progress at all after a couple of epochs, there may be a problem. Some models will start to show progress after 3000 steps.*

5. Run some number of training steps (using the `.fit` method) and visualise the outputs. Visualising predictions from the trained model every 3000 steps or so is useful in providing insight into the training process, and for understanding when the model might be fully trained. *There is no definite answer to how much training will be required. If your model isn't showing any progress at all after a couple of epochs, there may be a problem. Some models will start to show progress after 3000 steps.*

nikeethr · 2026-01-06T02:57:36Z

packages/nci_site_archive/src/site_archive_nci/_CMIP5.py

            match = False

-        if path["interval"] not in self.interval:
+        if path["interval"] not in self.interval[0]:


should there be a len(...) == 1 check on self.interval to be consistent with other places this sort of comparison is done?

nikeethr · 2026-01-06T03:02:06Z

packages/pipeline/src/pyearthtools/pipeline/parallel.py



-class FutureFaker:
+class ResultCache:


Understand this isn't part of this PR, but any reason not to use functools and its cache options?

Not sure if this is thread-safe, manages duplicates or even guarantees that the referential counters are appropriately tracked. (Not that functools does this, but there is likely an equivilent that does in some other package, if it doesn't.)

For serial implementations and testing/dev it probably is okay, but since this is coming under parallel.py I'm raising this comment, to gauge the intent.

tennlee added 2 commits June 7, 2025 18:42

Commit WIP on temporal climate data handling

e5afb14

Further WIP on improvement to temporal retrieval with the need for ag…

26e1339

…gregation

tennlee commented Jun 7, 2025

View reviewed changes

nikeethr marked this pull request as draft June 10, 2025 02:25

nikeethr reviewed Jan 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Temporal climate enhancements #129

[DRAFT] Temporal climate enhancements #129

Uh oh!

tennlee commented Jun 7, 2025

Uh oh!

tennlee Jun 7, 2025

Uh oh!

tennlee commented Jun 7, 2025

Uh oh!

nikeethr Jan 6, 2026 •

edited

Loading

Uh oh!

nikeethr Jan 6, 2026

Uh oh!

nikeethr Jan 6, 2026 •

edited

Loading

Uh oh!

nikeethr Jan 6, 2026

Uh oh!

nikeethr Jan 6, 2026

Uh oh!

nikeethr Jan 6, 2026

Uh oh!

nikeethr Jan 6, 2026

Uh oh!

nikeethr Jan 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## A Quick Hands-On Approach

		This guide is suitable for scientists or anyone else who wants to start trying things quickly to establish their first model and make a first attempt. More detail is provided below with more detail on the nuances and alternatives for each step.

	2. Determine the data source they come from, such as ERA5 or another model or re-analysis source
	2. Determine the data source they come from, such as ERA5 or another model or re-analysis source.

	3. Develop a `pipeline` which includes data normalisation
	3. Develop a `pipeline` which includes data normalisation.



		[project]
		name = "pyearthtools-models"



		class FutureFaker:
		class ResultCache:

[DRAFT] Temporal climate enhancements #129

Are you sure you want to change the base?

[DRAFT] Temporal climate enhancements #129

Uh oh!

Conversation

tennlee commented Jun 7, 2025

Uh oh!

tennlee Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

tennlee commented Jun 7, 2025

Uh oh!

nikeethr Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

nikeethr Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikeethr Jan 6, 2026 •

edited

Loading

nikeethr Jan 6, 2026 •

edited

Loading

nikeethr Jan 6, 2026 •

edited

Loading