Chapter 4: Dynamic Programming 🔗 Notes
Convergence of iterative policy evaluation on a small gridworld. The last policy is guaranteed only to be an improvement over the random policy, but in this case it, and all policies after the third iteration, are optimal. Code
The sequence policies found by policy iteration on Jack's car rental problem, and the final state-value function. The first four diagrams show, for each number of cars at each location at the end of the day, the number of cars to be moved from the first location to the second (negative numbers indicate transfer from the second location to the first).
Code
The MDP part of the Jack's Car Rental environment is based on Gertjan Gsverhoeven's implementation, which uses the MIT Liscence: GitHub
Before running this example, be sure to first install the Jack's Car Rental environment by:
cd gym_env
pip install .A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads, he wins as many dollars as he has staked on that flip; if it is tails, he loses his stake. In this example, the probability of the coin comming up head is
One of Jack's employee's happy to shuttle the a car from the location to the second for free. Moreover, if there's more than 10 cars are kept overnight at a location (after any moving of cars), an additional cost of $4 must be incurred to use a second parking lot.
Code
To create the Re-solve Jack's Car Rental environment, the resolve=True needed to be included as an argument when initialing the environment:
env = gymnasium.make('JacksCarRental-v0', resolve=True)Implement value iteration for the gambler's problem and solve it for



















