Objectives and features by TheEimer · Pull Request #62 · automl/arlbench

TheEimer · 2025-10-16T12:41:39Z

Some random objectives and features I added over the last months. Includes:

discounted eval reward (see Discounted return objective #35 )
training reward (discounted/mean/std)
weight statistic features
loss features
prediction features

There are some issues, though, but I'm not really sure how to deal with them.

The discounted objectives have a fixed gamma right now because it seems pretty difficult to add an argument for that
The Weight objective in particular has a shape issue that could actually become a problem: PPO is much much smaller in terms of different networks than e.g. SAC, so we can either pad with zeros (as in the prediction features) or not adapt the shape (as it is now). This isn't exactly user friendly either way, unfortunately.
There's a lot of code duplication since everything is static. Not sure if that's inefficiency on my part, I'll gladly refactor if you have ideas how to improve that.
The objectives themselves work, but their functionality is not currently tested (as is the case for the other objectives/features). I know it would be good to add, just haven't done so yet.

LabChameleon · 2025-10-17T14:08:30Z

The discounted objectives have a fixed gamma right now because it seems pretty difficult to add an argument for that

We could do a solution similar to some kernels in SciPy, where the arguments are attached to the name. If our objectives have only very few arguments in most cases (which I believe to be the case), this should work out fine. E.g. we could have the following options:

objectives: ["discounted_reward_mean"]: the default behaviour with discount 0.99
objectives: ["discounted_reward_mean_0.999"]: uses 0.999 for the discount.

This should be easy to implement. We only need to make an if clause in the objective creation and add the relevant logic to handle the objective name and call the objective class with the discount argument.

I would prefer this solution over another config entry, e.g. objective_arguments, which would be empty most of the time anyway.

LabChameleon · 2025-10-17T14:33:45Z

The Weight objective in particular has a shape issue that could actually become a problem: PPO is much, much smaller in terms of different networks than e.g. SAC, so we can either pad with zeros (as in the prediction features) or not adapt the shape (as it is now). This isn't exactly user-friendly either way, unfortunately.

This is a good point. However, I think this is a more general problem. Since we are planning to add NAS with #34, the network architecture will change during optimisation and therefore also the observation space of the RL Environment.

One solution I could think of is to add an additional info dictionary in addition to the reward and observation. Then use this info dictionary for things where we can not guarantee a fixed shape, e.g. weights. Not really optimal, but I don't know how we can ensure a fixed observation shape otherwise. Especially with NAS. I don't like the zero-padding solution. I think this is rather confusing for users, prone to bugs and might also not transfer to new architectures added to ARLBench.

TheEimer · 2025-10-20T09:56:58Z

The discounted objectives have a fixed gamma right now because it seems pretty difficult to add an argument for that

We could do a solution similar to some kernels in SciPy, where the arguments are attached to the name. If our objectives have only very few arguments in most cases (which I believe to be the case), this should work out fine. E.g. we could have the following options:

objectives: ["discounted_reward_mean"]: the default behaviour with discount 0.99

objectives: ["discounted_reward_mean_0.999"]: uses 0.999 for the discount.

This should be easy to implement. We only need to make an if clause in the objective creation and add the relevant logic to handle the objective name and call the objective class with the discount argument.

I would prefer this solution over another config entry, e.g. objective_arguments, which would be empty most of the time anyway.

I added this now, it's only a few more lines of code and should also work for future objectives/arguments.

TheEimer · 2025-10-20T10:41:34Z

The Weight objective in particular has a shape issue that could actually become a problem: PPO is much, much smaller in terms of different networks than e.g. SAC, so we can either pad with zeros (as in the prediction features) or not adapt the shape (as it is now). This isn't exactly user-friendly either way, unfortunately.

This is a good point. However, I think this is a more general problem. Since we are planning to add NAS with #34, the network architecture will change during optimisation and therefore also the observation space of the RL Environment.

One solution I could think of is to add an additional info dictionary in addition to the reward and observation. Then use this info dictionary for things where we can not guarantee a fixed shape, e.g. weights. Not really optimal, but I don't know how we can ensure a fixed observation shape otherwise. Especially with NAS. I don't like the zero-padding solution. I think this is rather confusing for users, prone to bugs and might also not transfer to new architectures added to ARLBench.

I resolved this differently now. Since algorithm-specific shapes will be different in a lot of features, I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway). We can also put that into an info dict, but in this case I personally would want it in one place.

A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large).

TheEimer · 2025-10-20T13:56:25Z

Good news: I found out what was at least partially responsible for the install timeout: the doc template specifies no sphinx versions at all, so it looked through all available versions of sphinx + sublibraries. Locally it was fine for me anyway, but now it installs again on GitHub :D

TheEimer · 2025-10-20T14:09:33Z

Found my mistake, I didn't add the new features to the "track_trajectory" and "track_metrics" checks. They all work now without adding checkpoint options. I also added the new ones to the docs. So open questions from above are:

can we reduce code duplication somehow?
should we test the function of the objectives/features?

TheEimer · 2025-10-20T14:55:59Z

I actually found a possible solution for the code duplication issue with the objectives. It involves defining a default argument and using that for fetching an aggregation function. So it works like the "_gamma_0.9" postfix, but we omit "gamma" in this case (only possible for a single argument). That's nice since now we can have one "Reward" objective that handles "reward_mean", "reward_std", "reward_median", etc. - could even handle discounted, but I guess it makes sense to keep that separate. So we then basically don't have to deal with reward objectives ever again ;D

Cons: assumes default arg comes first and also that other arguments are floats (pre-existing issue, but I think that should be alright). Also slightly hacky parsing code (though not so bad and only ~20 lines). The bigger issue is that for things like std, the optimize flag is reversed and that seems harder to fix (I manually hacked it and decided that we should reverse the optimize flag for std and var aggregation). What do you think? Keep or roll back?

LabChameleon · 2025-10-21T15:52:11Z

arlbench/autorl/autorl_env.py

        cfg_objectives = list(set(self._config["objectives"]))
        for o in cfg_objectives:
-            if o not in OBJECTIVES:
+            if o not in OBJECTIVES and ("_".join(o.split("_")[:1]) in OBJECTIVES or "_".join(o.split("_")[:2]) in OBJECTIVES or "_".join(o.split("_")[:3]) in OBJECTIVES):


I think we should do this more generally than checking for the three different cases explicitly.

see c7ad707

LabChameleon · 2025-10-21T16:01:15Z

found a possible solution for the code duplication issue with the objectives. It involves defining a default argument and using that for fetching an aggregation function. So it works like the "_gamma_0.9" postfix, but we omit "gamma" in this case (only possible for a single argument). That's nice since now we can have one "Reward" objective that handles "reward_mean", "reward_std", "reward_median", etc. - could even handle discounted, but I guess it makes sense to keep that separate. So we then basically don't have to deal with reward objectives ever again ;D

I like this solution!

LabChameleon · 2025-10-21T16:13:34Z

I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway).

But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible.

LabChameleon · 2025-10-21T16:14:15Z

A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large).

I agree with this!

TheEimer · 2025-10-22T07:48:49Z

I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway).

But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible.

Right now this doesn't return the weights, only statistics about the weights and biases within each network. The full weights are available via tracking, but I don't think we usually want to work with the full weight set, right?

TheEimer · 2025-10-22T07:49:40Z

A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large).

I agree with this!

I think this is solved now, actually. Or it was solved already, the track_metrics and track_trajectories flags are extended in init by certain objectives and state features, I just added the new ones

LabChameleon · 2025-10-22T12:34:01Z

I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway).

But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible.

Right now this doesn't return the weights, only statistics about the weights and biases within each network. The full weights are available via tracking, but I don't think we usually want to work with the full weight set, right?

Yes, I think so as well. But the statistics of the weights are computed layerwise, if I am not mistaken. So the network architecture (e.g. depth of the network) is then still important for the observation shape. But I like the current solution! I think this is a problem that we should deal with when adding NAS. Then we can make the new design choice a cohesive solution.

LabChameleon

Looks good to me now!

…ures

TheEimer added 4 commits October 16, 2025 13:47

additional state features & objectives

07030ba

Add discounted objectives

ac64fa8

Weight info state features

5293808

formating

a7dd3e0

LabChameleon self-requested a review October 17, 2025 13:42

argument parsing of objectives

da0e96c

algorithm dependent feature shapes

e9de4ff

TheEimer added 3 commits October 20, 2025 12:44

formatting/slight bug

14400b8

add more versioning

fd43721

extended doc dependency versions

14c1a76

document new options

94d020e

proposal: make aggregate flexible among reward objectives

dd75928

LabChameleon reviewed Oct 21, 2025

View reviewed changes

simplified objective arguments

c7ad707

LabChameleon self-requested a review October 22, 2025 12:35

LabChameleon approved these changes Oct 22, 2025

View reviewed changes

fixed metric names

7b20934

TheEimer merged commit d768e29 into main Oct 22, 2025
2 checks passed

TheEimer deleted the objectives-and-features branch October 22, 2025 14:54

github-actions bot pushed a commit that referenced this pull request Oct 22, 2025

Theresa Eimer: Merge pull request #62 from automl/objectives-and-feat…

63b70dc

…ures

Conversation

TheEimer commented Oct 16, 2025

Uh oh!

LabChameleon commented Oct 17, 2025

Uh oh!

LabChameleon commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheEimer commented Oct 20, 2025

Uh oh!

TheEimer commented Oct 20, 2025

Uh oh!

TheEimer commented Oct 20, 2025

Uh oh!

TheEimer commented Oct 20, 2025

Uh oh!

TheEimer commented Oct 20, 2025

Uh oh!

LabChameleon Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

LabChameleon Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

LabChameleon commented Oct 21, 2025

Uh oh!

LabChameleon commented Oct 21, 2025

Uh oh!

LabChameleon commented Oct 21, 2025

Uh oh!

TheEimer commented Oct 22, 2025

Uh oh!

TheEimer commented Oct 22, 2025

Uh oh!

LabChameleon commented Oct 22, 2025

Uh oh!

LabChameleon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LabChameleon commented Oct 17, 2025 •

edited

Loading