Revised training core by Drenderer · Pull Request #81 · Drenderer/klax

Drenderer · 2025-12-01T05:51:51Z

Key improvements:

Split training ingredients into state (model and optimizer state) and static (optimizer, batcher, loss, etc.)
Defined a new core training_loop function that only takes state, static, and callbacks as arguments. The loop updates the state.
Removed CallbackArgs in favor of a TrainingView, which provides read-only access to static and read-write access to the state. This greatly simplifies the callbacks.
Defined an ABC for Losses, which allows for custom gradient computations and implements model unwrapping by default (which avoids a redefinition of the loss in the fit function).
Refactored datahandler.
Removed the HistoryCallback in favor of a more general logging framework using a MetricLogger callback and a dict-like History object. This significantly improves the separation of concerns and enables the user to easily track custom metrics.
Refactored klax.fit to recreate the old behaviour with the new components.

… the decorator.

validation loss computations.

jaosch

Very cool changes. The main challenge will be to implement a cache, that is easy to reuse for any other callback.

klax/_losses.py

klax/_training.py

klax/_callbacks.py

…p, improved docs and removed the data from metric_defs, which now only require the model as input to compute the metrics.

Drenderer · 2025-12-11T00:58:51Z

Very cool changes. The main challenge will be to implement a cache, that is easy to reuse for any other callback.

Thanks! :)
But I disagree about the caching, the user should just provide a cached function as metric_def.

Drenderer · 2025-12-11T01:02:37Z

With the history callback now storing the metric definitions it is no longer possible to pickle it. This makes saving difficult. It would be possible to just not save the metric_defs, but that means that after loading the user needs to supply the defs again if they want to continue training. 😐

jaosch · 2025-12-15T13:44:27Z

With the history callback now storing the metric definitions it is no longer possible to pickle it. This makes saving difficult. It would be possible to just not save the metric_defs, but that means that after loading the user needs to supply the defs again if they want to continue training. 😐

This is a similar problem to model serialization with hyperparameters (see https://docs.kidger.site/equinox/examples/serialisation/). But I am not yet sure, what the solution could be.

The recent changes have been mostly reverted, because jax.tree.reduce does not give the desired output with `operator.eq`.

Datahandler tests now pass.

This enables compatibility with any `optax.GradientTransormationExtraArgs` without any additional boilerplate for the user.

The `fit` function now calls the `TrainState.create` function for lack of an `__init__`. The `run_training_loop` function now uses a partitioned version of the loss enable the use of arbitrary opatax optimizers again.

`TrainStatic` made compatible with `optax.GradientTransformation` by casting them to `optay.GradientTransformationExtraArgs`. This required the implementation of a constructor, because field converters are not yet implemented in Python (see PEP 712)

All tests in test_training.py pass now.

…r the generalized training.

…ainer. Implemented and integrated a MetricLogger. Streamlined training functionality.

…, instead of spamming.

…more comprehensive docstring.

…d_training

Drenderer added 2 commits August 27, 2025 23:53

Started a draft for the new trianing framework.

b65b4b9

Created fist working version of the new training core.

70f8cbd

Drenderer marked this pull request as draft December 1, 2025 05:52

Drenderer added 9 commits December 1, 2025 13:05

Added simple benchmark script for measuring training time.

97e3060

Removed plotting from benchmark script.

4491ae0

TrainingState pytree registration for flattening trick.

0493228

Fixed issue with loss decorator and redefined mse and mae losses with…

16e9c07

… the decorator.

Improved comments, docstrings and HistoryCallback plotting behaviour.

3999f6a

Updated the docs examples to work with the new training loop

8a04afb

Jitted loss value and value_and_grad for speeding up training and

577e3f7

validation loss computations.

Fixed some type annotations.

7e6a2ad

Further typing fixes.

1717c7e

jaosch requested changes Dec 10, 2025

View reviewed changes

Renamed step_loss to batch_loss and training_loop to run_training_loo…

b1d6bb0

…p, improved docs and removed the data from metric_defs, which now only require the model as input to compute the metrics.

Drenderer added 2 commits December 11, 2025 02:05

Some minor improvements and docs updates

63bb644

Started work on overhauling the tests.

9d66df8

jaosch and others added 11 commits December 15, 2025 16:53

Reverted change in broadcast_and_get_size that fails for nested PyTree.

dd476a0

The recent changes have been mostly reverted, because jax.tree.reduce does not give the desired output with `operator.eq`.

Renamed batch_axis to batch_axes.

06e96fe

Datahandler tests now pass.

Fixed default value for total_step_digits.

5c6e684

Added a partitioned version to Loss

4d6c76b

This enables compatibility with any `optax.GradientTransormationExtraArgs` without any additional boilerplate for the user.

Fixed fit and enabled arbitray optimizers in run_training_loop.

3f68859

The `fit` function now calls the `TrainState.create` function for lack of an `__init__`. The `run_training_loop` function now uses a partitioned version of the loss enable the use of arbitrary opatax optimizers again.

Fixed callback tests except for saving and loading history.

5e541cd

Update training tests for new API

1ce3ecf

All tests in test_training.py pass now.

Started work on a more strict distinction between static and state fo…

1524e70

…r the generalized training.

Continued work on strict destinction between state and static.

25e835c

Removed HistoryCallback in favor of a MetricLogger and a History cont…

a9439fa

…ainer. Implemented and integrated a MetricLogger. Streamlined training functionality.

Drenderer and others added 21 commits January 23, 2026 18:03

Added or modified tests for logging, losses, trainstate and training.

403fdcd

Added an optional tqdm-based progressbar to the logger as new default…

1acc39c

…, instead of spamming.

Implemented combining histories via extension.

7e43736

Implemented saving and loading of history containers.

4d09317

Implemented basic plotting functionality for the history.

fcc06eb

Changed the callback call function to on_training_step and added a …

e83caff

…more comprehensive docstring.

Removed unused import in _datahandler.py.

10fb682

Keys function for the history and improved docstrings.

e098f00

Major overhaul of the documentation.

b313db9

Updated simple training test script.

c4d7817

Merge branch 'develop' into feature/generalized_training

08ef413

Fixed batch_axis -> batch_axes in docstring.

45806b8

Merge remote-tracking branch 'origin/develop' into feature/generalize…

ec3d03d

…d_training

Merge branch 'feature/generalized_training' into feature/progressbar

51d1c9d

Minor script change

4207ad5

Merge branch 'develop' into feature/progressbar

b49c012

Fixed missing import in __init__.py

4557551

Capitalized custom types in logging.

f77881d

Fixed logging test issue with capturing printed output.

ab51d7e

Merge pull request #82 from Drenderer/feature/progressbar

d6c0de4

Small update to the wrapper docs.

d3f9b54

Drenderer marked this pull request as ready for review January 27, 2026 13:38

Drenderer added 3 commits January 27, 2026 17:32

Fixed issue with overwriting the default label in History.plot().

b9ba9fc

Updated the examples to the new generalized training loop api.

5b0d1c8

Refactored some scripts for the new training core.

b1ce763

Drenderer merged commit e103a67 into develop Jan 28, 2026
1 check passed

Drenderer deleted the feature/generalized_training branch January 28, 2026 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revised training core#81

Revised training core#81
Drenderer merged 49 commits intodevelopfrom
feature/generalized_training

Drenderer commented Dec 1, 2025 •

edited

Loading

Uh oh!

jaosch left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Drenderer commented Dec 11, 2025 •

edited

Loading

Uh oh!

Drenderer commented Dec 11, 2025

Uh oh!

jaosch commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Drenderer commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key improvements:

Uh oh!

jaosch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Drenderer commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Drenderer commented Dec 11, 2025

Uh oh!

jaosch commented Dec 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Drenderer commented Dec 1, 2025 •

edited

Loading

Drenderer commented Dec 11, 2025 •

edited

Loading