Conversation
We could do a solution similar to some kernels in SciPy, where the arguments are attached to the name. If our objectives have only very few arguments in most cases (which I believe to be the case), this should work out fine. E.g. we could have the following options:
This should be easy to implement. We only need to make an if clause in the objective creation and add the relevant logic to handle the objective name and call the objective class with the discount argument. I would prefer this solution over another config entry, e.g. |
This is a good point. However, I think this is a more general problem. Since we are planning to add NAS with #34, the network architecture will change during optimisation and therefore also the observation space of the RL Environment. One solution I could think of is to add an additional |
I added this now, it's only a few more lines of code and should also work for future objectives/arguments. |
I resolved this differently now. Since algorithm-specific shapes will be different in a lot of features, I made the algorithm an argument for the state space shape. For the weight info, this now means the base shape is different between algorithms, but we pad e.g. if DQN doesn't have a target network (which makes sense imo because we'd want to have consistent shapes between runs anyway). We can also put that into an info dict, but in this case I personally would want it in one place. A bigger problem that I've noticed while looking at the objectives, though: some of them require things like gradients, weights or losses, which are only tracked when they're checkpointed. I'd like to decouple that. Just because I want weight features doesn't mean I want weight checkpoints (especially relevant for gradients, I think, those can get really large). |
|
Good news: I found out what was at least partially responsible for the install timeout: the doc template specifies no sphinx versions at all, so it looked through all available versions of sphinx + sublibraries. Locally it was fine for me anyway, but now it installs again on GitHub :D |
|
Found my mistake, I didn't add the new features to the "track_trajectory" and "track_metrics" checks. They all work now without adding checkpoint options. I also added the new ones to the docs. So open questions from above are:
|
|
I actually found a possible solution for the code duplication issue with the objectives. It involves defining a default argument and using that for fetching an aggregation function. So it works like the "_gamma_0.9" postfix, but we omit "gamma" in this case (only possible for a single argument). That's nice since now we can have one "Reward" objective that handles "reward_mean", "reward_std", "reward_median", etc. - could even handle discounted, but I guess it makes sense to keep that separate. So we then basically don't have to deal with reward objectives ever again ;D Cons: assumes default arg comes first and also that other arguments are floats (pre-existing issue, but I think that should be alright). Also slightly hacky parsing code (though not so bad and only ~20 lines). The bigger issue is that for things like std, the optimize flag is reversed and that seems harder to fix (I manually hacked it and decided that we should reverse the optimize flag for std and var aggregation). What do you think? Keep or roll back? |
arlbench/autorl/autorl_env.py
Outdated
| cfg_objectives = list(set(self._config["objectives"])) | ||
| for o in cfg_objectives: | ||
| if o not in OBJECTIVES: | ||
| if o not in OBJECTIVES and ("_".join(o.split("_")[:1]) in OBJECTIVES or "_".join(o.split("_")[:2]) in OBJECTIVES or "_".join(o.split("_")[:3]) in OBJECTIVES): |
There was a problem hiding this comment.
I think we should do this more generally than checking for the three different cases explicitly.
I like this solution! |
But this might still be problematic in cases like NAS, wouldn't it? Padding requires us to know the largest architecture that will be used during an optimisation run. But I would be ok with this solution for now. When we do the NAS integration, we might need to add some code to infer the largest possible network architecture. This should be possible. |
I agree with this! |
Right now this doesn't return the weights, only statistics about the weights and biases within each network. The full weights are available via tracking, but I don't think we usually want to work with the full weight set, right? |
I think this is solved now, actually. Or it was solved already, the track_metrics and track_trajectories flags are extended in init by certain objectives and state features, I just added the new ones |
Yes, I think so as well. But the statistics of the weights are computed layerwise, if I am not mistaken. So the network architecture (e.g. depth of the network) is then still important for the observation shape. But I like the current solution! I think this is a problem that we should deal with when adding NAS. Then we can make the new design choice a cohesive solution. |
LabChameleon
left a comment
There was a problem hiding this comment.
Looks good to me now!
Some random objectives and features I added over the last months. Includes:
There are some issues, though, but I'm not really sure how to deal with them.