Cart-Pole

Cart-pole is a classical example in non-linear dynamics and control theory. The system consists of a cart that is free to move along the ground, with a pendulum attached to it (as illustrated below).

This repository is a Python playground for the cart-pole system. The majority of the project is implemented from scratch, and many of the equations used are derived in dynamics.ipynb.

The primary focus of the repo is to prototype and compare controllers that balance the pole upright, a non trivial problem because the upright configuration is an unstable equilibrium (see below). Implemented approaches include:

Reinforcement learning (RL)
- Q-learning
- Deep Q-Network (DQN)
Classical controllers
- Linear quadratic regulator (LQR)
- Energy-based controlling
- Hybrid controlling

System overview	Passive dynamics - starting with a pole angle of 0.01 rad

Reinforcement learning

Reinforcement learning is a branch of machine learning where an agent learns a policy (a decision-making rule) by interacting with an environment, receiving feedback through rewards and improving via trial and error.

Q-learning

I implemented tabular Q-learning from scratch and trained it to balance the pole while keeping the cart near the origin. Unlike classical controllers, the Q-learning agent uses discretized state and action spaces. It can apply only a fixed force to the left or right, which yields less smooth trajectories than controllers with continuous actions. Below is a Q-learning policy with two actions, trying to handle the initial state [1.2, 0.1, 0.2, 0.1 ].

Deep Q-Network (DQN)

The DQN agent uses a neural network to approximate the Q-function over a continuous state space, in contrast to Q-learning which operates with a discrete state space, while still having a discrete set of actions to choose from. DQN is better suited to handle more actions and therefore it is able to control the cart more smoothly. Below is a DQN policy trained with four actions, trying to handle the initial state [3, 0, 0.8, 0].

Classical Controllers

Classical controllers are derived from an analytic model of the system’s dynamics.

LQR. A locally optimal state-feedback controller designed on the dynamics linearized about the upright configuration ($\theta=0$). It performs well when the pole is already near upright, but is unstable as $\theta \gg 0$.
Energy-based (swing-up). Drives the pendulum's energy toward a target equal to the upright potential energy, while also trying to keep the cart near the origin ($x\approx 0$). This approach can reliably 'swing up' from a broad set of initial conditions, but it does not stabilize in the upright equilibrium; it gets the pole close to the top but won’t balance it.
Hybrid (switching). A controller that switches between the energy based controller and LQR as it sees fit.

To illustrate behavior, the controllers are compared side by side below subject to two different initial conditions.

Near Upright (disturbed but recoverable)

Initial state [−2.0, 1.0, 0.5, 0.6] for $[x, \dot{x}, \theta, \dot{\theta}]$. The cart starts far from the origin with noticeable velocity, and the pole is slightly off balance.

LQR handles this case well, as expected in its region of validity.
Energy-based maintains large swings while roughly keeping the cart near 0, but it does not capture and balance the upright.
Hybrid uses the energy policy to manage swing dynamics, then switches to LQR near upright to settle and balance.

Passive dynamics	LQR

Energy-based	Hybrid

Near Downward (swing-up required)

Initial state [0.1, −0.3, 2.8, −0.5] for $[x, \dot{x}, \theta, \dot{\theta}]$. The cart is nearly at rest.

LQR is designed around the upright position and is unstable this far from θ = 0.
Energy-based behaves as in the previous case: it injects/extracts energy to build a large swing while trying to keep the cart near the origin, but it does not balance upright.
Hybrid uses energy-based swing-up to approach a neighborhood of $\theta=0$, then switches to LQR to stabilize once close to upright conditions are met.

Passive dynamics	LQR (fails far from equilibrium)

Energy-based	Hybrid

As can be seen from these two cases, the LQR and energy controllers are suited to handle different operating conditions. Below, the first video demonstrates that the hybrid controller can outperform LQR, even when LQR is able to balance the pole. A second video visualizes the control input/action from the hybrid policy, clearly showing the switch from energy-based to LQR controlling.

LQR	Hybrid

Installation

Install Python 3.8+

Clone the repository and install dependencies:

git clone https://github.com/Martin-Ansteensen/cart-pole.git
cd cart-pole
pip install .

Running Simulations

Launch the main driver, used for simulating the response of different controllers, with:
```
cart-pole-run
```
Get help by providing --help:
```
cart-pole-run --help
```
Useful flags:
- --initial-state to supply $x$, $\dot{x}$, $\theta$, $\dot{\theta}$.
- --duration and --dt to control simulation length and step size.
- --physical / --controller / --controller-profile to pick presets defined in configs.json.
- --list to print all available presets.
- --plots when you want plots in addition to the animation to gain more insight into simulation.
To get help with training new Q-learning policies, run:
```
cart-pole-train-q train --help
```
The same applies for DQN:
```
cart-pole-train-dqn train --help
```
All of the training parameters are not exposed through the CLI, so if you want to tune those you need to change the code.

Exporting Demonstration Videos

Supply a path (with filename, but no extension) with --save-path

Recreate the showcase animations:

# Near upright initial condition
cart-pole-run --controller none --duration 8 --initial-state -2 1 0.5 0.6 --trace-tip --save-path media/upright_none
cart-pole-run --controller lqr --duration 8 --initial-state -2 1 0.5 0.6 --trace-tip --save-path media/upright_lqr
cart-pole-run --controller energy --duration 8 --initial-state -2 1 0.5 0.6 --trace-tip --save-path media/upright_energy
cart-pole-run --controller hybrid --duration 8 --initial-state -2 1 0.5 0.6 --trace-tip --save-path media/upright_hybrid

# Near downward initial condition
cart-pole-run --controller none --duration 8 --initial-state 0.1 -0.3 2.8 -0.5 --trace-tip --save-path media/downright_none
cart-pole-run --controller lqr --duration 8 --initial-state 0.1 -0.3 2.8 -0.5 --trace-tip --save-path media/downright_lqr
cart-pole-run --controller energy --duration 8 --initial-state 0.1 -0.3 2.8 -0.5 --trace-tip --save-path media/downright_energy
cart-pole-run --controller hybrid --duration 8 --initial-state 0.1 -0.3 2.8 -0.5 --trace-tip --save-path media/downright_hybrid

cart-pole-run --controller lqr --duration 5 --initial-state 3 1 1.1 0 --trace-tip --save-path media/lqr_almost_unstable
cart-pole-run --controller hybrid --duration 5 --initial-state 3 1 1.1 0 --trace-tip --save-path media/hybrid_very_stable
# Hybrid controller with plots
cart-pole-run --controller hybrid --duration 4 --initial-state -3 2 1.8 2 --plots --trace-tip --save-path media/hybrid_plots

# Tabular Q-learning
cart-pole-run --controller q_learning --duration 6 --initial-state 1.2 0.1 0.2 0.1 --plots --trace-tip --save-path media/highlight_q_learning

# Deep Q-Network
cart-pole-run --controller dqn --duration 8 --dt 0.02 --initial-state 3 0 0.8 0 --plots --trace --save-path media/dqn_highlight

Testing

Various tests to uncover bugs and errors are implemented in tests/. Run them with:
```
python3 -m unittest
```

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
cart_pole		cart_pole
media		media
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
dynamics.ipynb		dynamics.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cart-Pole

Reinforcement learning

Q-learning

Deep Q-Network (DQN)

Classical Controllers

Near Upright (disturbed but recoverable)

Near Downward (swing-up required)

Installation

Running Simulations

Exporting Demonstration Videos

Testing

About

Uh oh!

Releases

Languages

Martin-Ansteensen/cart-pole

Folders and files

Latest commit

History

Repository files navigation

Cart-Pole

Reinforcement learning

Q-learning

Deep Q-Network (DQN)

Classical Controllers

Near Upright (disturbed but recoverable)

Near Downward (swing-up required)

Installation

Running Simulations

Exporting Demonstration Videos

Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages