Skip to content
This repository was archived by the owner on Apr 7, 2025. It is now read-only.

Latest commit

 

History

History

README.md

Chapter 4: Dynamic Programming     🔗 Notes

Examples

4.1 Policy Evaluation Gridworld (p.77)

Convergence of iterative policy evaluation on a small gridworld. The last policy is guaranteed only to be an improvement over the random policy, but in this case it, and all policies after the third iteration, are optimal. Code

4.2 Jack's Car Rental (p.81)

The sequence policies found by policy iteration on Jack's car rental problem, and the final state-value function. The first four diagrams show, for each number of cars at each location at the end of the day, the number of cars to be moved from the first location to the second (negative numbers indicate transfer from the second location to the first). Code

Special attention:

The MDP part of the Jack's Car Rental environment is based on Gertjan Gsverhoeven's implementation, which uses the MIT Liscence: GitHub

Before running this example, be sure to first install the Jack's Car Rental environment by:

cd gym_env
pip install .

4.3 Gambler's Problem (p.84)

A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads, he wins as many dollars as he has staked on that flip; if it is tails, he loses his stake. In this example, the probability of the coin comming up head is $p_h = 0.4$. The first plot shows the change in the value function over successive sweeps of value iteration, and the second plot shows one of the optimal policy found. Code

Exercises

4.7 Re-solve Jack's Car Rental problem (p.82)

One of Jack's employee's happy to shuttle the a car from the location to the second for free. Moreover, if there's more than 10 cars are kept overnight at a location (after any moving of cars), an additional cost of $4 must be incurred to use a second parking lot. Code
To create the Re-solve Jack's Car Rental environment, the resolve=True needed to be included as an argument when initialing the environment:

env = gymnasium.make('JacksCarRental-v0', resolve=True)

4.9 Resolve Gambler's Problem (p.84)

Implement value iteration for the gambler's problem and solve it for $p_h = 0.25$ and $p_h = 0.55$. Code

$p_h = 0.25:$

$p_h = 0.55:$