-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The project currently does not support the new gymnasium 1.0 API.
Among a few minor changes, the reset behavior of environments has changed.
Previously, the termination step returned the next_observation as the one from after the reset. To circumvent this, this observation currently gets overwritten using info["final_observation"].
In the new API, the last observation correctly returns the final observation and the next env.step will return the "invalid" transition from before the reset to after the reset. This step can not be used for learning, as it crosses reset boundaries.
For more information look into the Gymnasium Release Notes or this short writeup in the CleanRL-repo.
The MightyAgent class currently stores the replay buffer as a list of lists of e.g. returns. The second-level list signifies the different environments.
Since reset boundaries might be crossed by the environments at different points in time, we can not just throw away the transition using this structure.
We also can not keep these transitions, as they will lead to performance degradation.
A solution will probably require rewriting quite some code regarding the buffer to make this work, but I am not knowledgeable enough about this repository to properly gauge that.
There is a backwards-compatible API using the Autoreset-mode "Same Step", but this will break some wrappers that are currently used.
I have some code lying around that lets the code run, but it currently does not discard the "faulty" trajectories.
- Adapt code to new reset-api
- (?) Restructure replay buffer
- Change some wrapper imports
- If compatibility with older versions of Gymnasium is desirable:
- Keep old and new code by conditionally running based on Gymnasium's version