SOC

This project tests different reinforcement learning algorithms for stochastic optimal control problems. Namely they are used to solve a stochastic linear quadratic regulator problem in two different szenarios.

Szenario one:

We just solve

$$ X_{n+1} = AX_n+ B u_n+ \sigma \xi_{n+1} $$

where $X_n$ is the state at time $n$, $u_n$ is the control at time $n$ and $\xi_{n+1} \sim \mathcal{N}(0,1)$. We solve this problem for a time horizon $N \in \mathbb{N}$ and identify $n = t_n$ with $t_n = n \cdot \Delta t$. This equation has a closed-form solution which we use to compare the results generated by the algorithms. We aim to minimize

$$ J(x) = \mathbb{E} \bigg( \sum_{i=0}^{N-1} \Delta t (X_i^T Q X_i + u_i^T R u_i ) + X_N^T D X_N \big | X_0 = x \bigg) $$

Szenario two:

Here we try to solve the continous SOC

$$ d X_t = (A X_t + Bu_t)dt + \sigma dB_t $$

with $t \in [0, \tau]$ where $\tau := \min \lbrace{ N, \underset{t \geq 0}{\inf} \lbrace{ t: X_t \notin S \rbrace} \rbrace}$, $S = \lbrace{ X \in \mathbb{R}^n | \parallel X \parallel \in [1,3] \rbrace}$. Discretizing with Euler-Maruyama get us the time discrete problem

$$ X_{n+1} = X_n + (AX_n+ B u_n) \Delta t + \sigma \sqrt{\Delta t} \xi_{n+1} $$

We aim to minimize

$$ J(x) = \mathbb{E} \bigg( \sum_{i=0}^{\tau-1} (X_i^T Q X_i + u_i^T R u_i ) + X_\tau^T D X_\tau \big | X_0 = x \bigg) $$

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
AlphaZero		AlphaZero
Models		Models
Saved_Runs		Saved_Runs
__pycache__		__pycache__
.gitignore		.gitignore
LQR_TD3_with_info.py		LQR_TD3_with_info.py
OhrnsteinUhlenbeckProcess.py		OhrnsteinUhlenbeckProcess.py
OpenAI.ipynb		OpenAI.ipynb
README.md		README.md
Sim_Balls_Committor.png		Sim_Balls_Committor.png
ansatz_lqr_2D_TD3.py		ansatz_lqr_2D_TD3.py
committor_10D_TD3.py		committor_10D_TD3.py
committor_SAC.py		committor_SAC.py
committor_TD3.py		committor_TD3.py
committor_transform.py		committor_transform.py
inverted_pendulum.ipynb		inverted_pendulum.ipynb
inverted_pendulum_TD3.py		inverted_pendulum_TD3.py
jacobi_policy_fct_pendulum.npy		jacobi_policy_fct_pendulum.npy
jacobi_value_fct_pendulum.npy		jacobi_value_fct_pendulum.npy
linear_quadratic_regulator.py		linear_quadratic_regulator.py
linear_quadratic_regulator_DDPG.py		linear_quadratic_regulator_DDPG.py
linear_quadratic_regulator_DP.py		linear_quadratic_regulator_DP.py
linear_quadratic_regulator_DQN.py		linear_quadratic_regulator_DQN.py
linear_quadratic_regulator_PPO.py		linear_quadratic_regulator_PPO.py
linear_quadratic_regulator_SAC.py		linear_quadratic_regulator_SAC.py
linear_quadratic_regulator_TD3.py		linear_quadratic_regulator_TD3.py
linear_quadratic_regulator_clipped_DQN.py		linear_quadratic_regulator_clipped_DQN.py
linear_quadratic_regulator_polynom.py		linear_quadratic_regulator_polynom.py
lqr_2D_DDPG.py		lqr_2D_DDPG.py
lqr_2D_SAC.py		lqr_2D_SAC.py
lqr_2D_TATD.py		lqr_2D_TATD.py
lqr_2D_TD3.py		lqr_2D_TD3.py
lqr_2D_TD3_BFGS.py		lqr_2D_TD3_BFGS.py
lqr_2D_TD3_MVE.py		lqr_2D_TD3_MVE.py
lqr_2D_TD3_Steihaug.py		lqr_2D_TD3_Steihaug.py
lqr_2D_TD3_n_step.py		lqr_2D_TD3_n_step.py
lqr_ND_TD3.py		lqr_ND_TD3.py
sim_balls.py		sim_balls.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SOC

Szenario one:

Szenario two:

About

Uh oh!

Releases

Packages

Languages

E314ling/SOC

Folders and files

Latest commit

History

Repository files navigation

SOC

Szenario one:

Szenario two:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages