Skip to content

Objectives and features#62

Merged
TheEimer merged 13 commits intomainfrom
objectives-and-features
Oct 22, 2025
Merged

Objectives and features#62
TheEimer merged 13 commits intomainfrom
objectives-and-features

Conversation

@TheEimer
Copy link
Collaborator

Some random objectives and features I added over the last months. Includes:

  • discounted eval reward (see Discounted return objective #35 )
  • training reward (discounted/mean/std)
  • weight statistic features
  • loss features
  • prediction features

There are some issues, though, but I'm not really sure how to deal with them.

  1. The discounted objectives have a fixed gamma right now because it seems pretty difficult to add an argument for that
  2. The Weight objective in particular has a shape issue that could actually become a problem: PPO is much much smaller in terms of different networks than e.g. SAC, so we can either pad with zeros (as in the prediction features) or not adapt the shape (as it is now). This isn't exactly user friendly either way, unfortunately.
  3. There's a lot of code duplication since everything is static. Not sure if that's inefficiency on my part, I'll gladly refactor if you have ideas how to improve that.
  4. The objectives themselves work, but their functionality is not currently tested (as is the case for the other objectives/features). I know it would be good to add, just haven't done so yet.

@LabChameleon LabChameleon self-requested a review October 17, 2025 13:42
@LabChameleon
Copy link
Collaborator

The discounted objectives have a fixed gamma right now because it seems pretty difficult to add an argument for that

We could do a solution similar to some kernels in SciPy, where the arguments are attached to the name. If our objectives have only very few arguments in most cases (which I believe to be the case), this should work out fine. E.g. we could have the following options:

  1. objectives: ["discounted_reward_mean"]: the default behaviour with discount 0.99
  2. objectives: ["discounted_reward_mean_0.999"]: uses 0.999 for the discount.

This should be easy to implement. We only need to make an if clause in the objective creation and add the relevant logic to handle the objective name and call the objective class with the discount argument.

I would prefer this solution over another config entry, e.g. objective_arguments, which would be empty most of the time anyway.

@LabChameleon
Copy link
Collaborator

LabChameleon commented Oct 17, 2025

The Weight objective in particular has a shape issue that could actually become a problem: PPO is much, much smaller in terms of different networks than e.g. SAC, so we can either pad with zeros (as in the prediction features) or not adapt the shape (as it is now). This isn't exactly user-friendly either way, unfortunately.

This is a good point. However, I think this is a more general problem. Since we are planning to add NAS with #34, the network architecture will change during optimisation and therefore also the observation space of the RL Environment.

One solution I could think of is to add an additional info dictionary in addition to the reward and observation. Then use this info dictionary for things where we can not guarantee a fixed shape, e.g. weights. Not really optimal, but I don't know how we can ensure a fixed observation shape otherwise. Especially with NAS. I don't like the zero-padding solution. I think this is rather confusing for users, prone to bugs and might also not transfer to new architectures added to ARLBench.

@TheEimer
Copy link
Collaborator Author

The discounted objectives have a fixed gamma right now because it seems pretty difficult to add an argument for that

We could do a solution similar to some kernels in SciPy, where the arguments are attached to the name. If our objectives have only very few arguments in most cases (which I believe to be the case), this should work out fine. E.g. we could have the following options:

  1. objectives: ["discounted_reward_mean"]: the default behaviour with discount 0.99
  2. objectives: ["discounted_reward_mean_0.999"]: uses 0.999 for the discount.

This should be easy to implement. We only need to make an if clause in the objective creation and add the relevant logic to handle the objective name and call the objective class with the discount argument.

I would prefer this solution over another config entry, e.g. objective_arguments, which would be empty most of the time anyway.

I added this now, it's only a few more lines of code and should also work for future objectives/arguments.

@TheEimer
Copy link
Collaborator Author

The Weight objective in particular has a shape issue that could actually become a problem: PPO is much, much smaller in terms of different networks than e.g. SAC, so we can either pad with zeros (as in the prediction features) or not adapt the shape (as it is now). This isn't exactly user-friendly either way, unfortunately.

This is a good point. However, I think this is a more general problem. Since we are planning to add NAS with #34, the network architecture will change during optimisation and therefore also the observation space of the RL Environment.

One solution I could think of is to add an additional info dictionary in addition to the reward and observation. Then use this info dictionary for things where we can not guarantee a fixed shape, e.g. weights. Not really optimal, but I don't know how we can ensure a fixed observation shape otherwise. Especially with NAS. I don't like the zero-padding solution. I think this is rather confusing for users, prone to bugs and might also not transfer to new architectures added to ARLBench.

I resolved this differently now. Since algorithm-specific shapes will be different in a lot of features, I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway). We can also put that into an info dict, but in this case I personally would want it in one place.

A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large).

@TheEimer
Copy link
Collaborator Author

Good news: I found out what was at least partially responsible for the install timeout: the doc template specifies no sphinx versions at all, so it looked through all available versions of sphinx + sublibraries. Locally it was fine for me anyway, but now it installs again on GitHub :D

@TheEimer
Copy link
Collaborator Author

Found my mistake, I didn't add the new features to the "track_trajectory" and "track_metrics" checks. They all work now without adding checkpoint options. I also added the new ones to the docs. So open questions from above are:

  • can we reduce code duplication somehow?
  • should we test the function of the objectives/features?

@TheEimer
Copy link
Collaborator Author

I actually found a possible solution for the code duplication issue with the objectives. It involves defining a default argument and using that for fetching an aggregation function. So it works like the "_gamma_0.9" postfix, but we omit "gamma" in this case (only possible for a single argument). That's nice since now we can have one "Reward" objective that handles "reward_mean", "reward_std", "reward_median", etc. - could even handle discounted, but I guess it makes sense to keep that separate. So we then basically don't have to deal with reward objectives ever again ;D

Cons: assumes default arg comes first and also that other arguments are floats (pre-existing issue, but I think that should be alright). Also slightly hacky parsing code (though not so bad and only ~20 lines). The bigger issue is that for things like std, the optimize flag is reversed and that seems harder to fix (I manually hacked it and decided that we should reverse the optimize flag for std and var aggregation). What do you think? Keep or roll back?

cfg_objectives = list(set(self._config["objectives"]))
for o in cfg_objectives:
if o not in OBJECTIVES:
if o not in OBJECTIVES and ("_".join(o.split("_")[:1]) in OBJECTIVES or "_".join(o.split("_")[:2]) in OBJECTIVES or "_".join(o.split("_")[:3]) in OBJECTIVES):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this more generally than checking for the three different cases explicitly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see c7ad707

@LabChameleon
Copy link
Collaborator

found a possible solution for the code duplication issue with the objectives. It involves defining a default argument and using that for fetching an aggregation function. So it works like the "_gamma_0.9" postfix, but we omit "gamma" in this case (only possible for a single argument). That's nice since now we can have one "Reward" objective that handles "reward_mean", "reward_std", "reward_median", etc. - could even handle discounted, but I guess it makes sense to keep that separate. So we then basically don't have to deal with reward objectives ever again ;D

I like this solution!

@LabChameleon
Copy link
Collaborator

I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway).

But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible.

@LabChameleon
Copy link
Collaborator

A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large).

I agree with this!

@TheEimer
Copy link
Collaborator Author

I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway).

But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible.

Right now this doesn't return the weights, only statistics about the weights and biases within each network. The full weights are available via tracking, but I don't think we usually want to work with the full weight set, right?

@TheEimer
Copy link
Collaborator Author

A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large).

I agree with this!

I think this is solved now, actually. Or it was solved already, the track_metrics and track_trajectories flags are extended in init by certain objectives and state features, I just added the new ones

@LabChameleon
Copy link
Collaborator

I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway).

But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible.

Right now this doesn't return the weights, only statistics about the weights and biases within each network. The full weights are available via tracking, but I don't think we usually want to work with the full weight set, right?

Yes, I think so as well. But the statistics of the weights are computed layerwise, if I am not mistaken. So the network architecture (e.g. depth of the network) is then still important for the observation shape. But I like the current solution! I think this is a problem that we should deal with when adding NAS. Then we can make the new design choice a cohesive solution.

@LabChameleon LabChameleon self-requested a review October 22, 2025 12:35
Copy link
Collaborator

@LabChameleon LabChameleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now!

@TheEimer TheEimer merged commit d768e29 into main Oct 22, 2025
2 checks passed
@TheEimer TheEimer deleted the objectives-and-features branch October 22, 2025 14:54
github-actions bot pushed a commit that referenced this pull request Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants