GridWorldSingle

Implementation of Gridworld domains based on OpenAI Gym 0.15.4.

GridWorldSingle

An experimental 5x5 Gridworld domain with actions "up", "down", "left", and "right." The start state is chosen uniformly at random from all states at the border of the grid. The goal state is in the center of the grid.

GridWorldOrient

The Gridworld domains used in the work of Norm Ferns in which the agent's state is a combination of its orientation and its location in the grid. The agent can choose from the actions "forward" and "rotate" at each time point. "rotate" changes the agent's orientation clockwise and "forward" deterministically moves the agent one step forward if possible. A reward of 1 is obtained for reaching the goal grid location in the center of the grid and a reward of 0 for all other state-action combinations. The start state is chosen uniformly at random from all non-terminal states and performing 100 actions or reaching a goal state end an episode. The width of the underlying grid can be controlled.

The parameter "augmented" can be set to True to create a domain in which the abstract states of the coarsest Markov state abstraction are different from the ones of a Q*-irrelevance abstraction. Specifically, all non-terminal states are copied and an additional terminal state is created. For each state copy, the transition and reward probabilities for the optimal action are equivalent to the ones of the corresponding original state. Taking a non-optimal action a in the copy of a state s, however, leads to a reward of Q*(s, a) and to a transition to the new terminal state. Thus, the state copies have the same Q* values as the original states but different transition probabilities. Note that a Q*-irrelevance abstraction would map each copied state to the same activation as the corresponding original state, whereas the coarsest Markov state abstraction would map an original state and its copy to different activations.

The parameter "modified_rewards" can be set to True to change the reward function. More precisely, an immediate reward of -1 is obtained for moving forward in 4 states rather than the original reward of 0.

The parameter "modified_transitions" can be set to True to change some of the transition probabilities.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
gym-GridWorldSingle		gym-GridWorldSingle
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GridWorldSingle

About

Uh oh!

Languages

nelealbers/gridworld

Folders and files

Latest commit

History

Repository files navigation

GridWorldSingle

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages