Skip to content

Support for Gymnasium 1.0 #108

@jmtoepperwien

Description

@jmtoepperwien

The project currently does not support the new gymnasium 1.0 API.

Among a few minor changes, the reset behavior of environments has changed.
Previously, the termination step returned the next_observation as the one from after the reset. To circumvent this, this observation currently gets overwritten using info["final_observation"].
In the new API, the last observation correctly returns the final observation and the next env.step will return the "invalid" transition from before the reset to after the reset. This step can not be used for learning, as it crosses reset boundaries.
For more information look into the Gymnasium Release Notes or this short writeup in the CleanRL-repo.

The MightyAgent class currently stores the replay buffer as a list of lists of e.g. returns. The second-level list signifies the different environments.
Since reset boundaries might be crossed by the environments at different points in time, we can not just throw away the transition using this structure.
We also can not keep these transitions, as they will lead to performance degradation.
A solution will probably require rewriting quite some code regarding the buffer to make this work, but I am not knowledgeable enough about this repository to properly gauge that.

There is a backwards-compatible API using the Autoreset-mode "Same Step", but this will break some wrappers that are currently used.

I have some code lying around that lets the code run, but it currently does not discard the "faulty" trajectories.

  • Adapt code to new reset-api
    • (?) Restructure replay buffer
  • Change some wrapper imports
  • If compatibility with older versions of Gymnasium is desirable:
    • Keep old and new code by conditionally running based on Gymnasium's version

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions