This project implements an optimization-based controller for the LunarLander environment using reinforcement‐learning–inspired black-box search techniques. The agent policy is represented as a linear mapping from observations to action logits and is optimized using:
- Particle Swarm Optimization (PSO)
- Tabu Search refinement
- Hill Climbing fine-tuning
The implementation uses Gymnasium for environment simulation and NumPy for numerical computation.
The project uses the LunarLander-v3 environment from Gymnasium.
The observation space has 8 dimensions and the action space has 4 discrete actions.
The policy is a linear model:
action = argmax( observation · W + b )
where
Wis an 8 × 4 weight matrixbis a 4-dimensional bias vector- parameter vector size = 32 + 4 = 36
- single episode evaluation
- multi-episode averaged evaluation
- Particle Swarm Optimization
- Tabu Search refinement of PSO output
- Hill Climbing final local search
- Parameter saving and loading (
.npy) - Render mode for visual evaluation
- Command-line interface for training and playing
Install required dependencies:
pip install gymnasium
pip install gymnasium[box2d]
pip install numpyNote: Box2D support must be installed for LunarLander.
This trains using PSO → Tabu Search → Hill Climbing and saves best policy.
python main.py --train(optional arguments)
--population_size
--generations
--episodes
--filename
Example:
python main.py --train --population_size 60 --generations 200 --filename best.npypython main.py --play --filename best_policy.npyThis loads parameters and renders multiple episodes.
| File | Description |
|---|---|
best_policy.npy |
Saved optimized weight vector |
policy_action() Linear policy
evaluate_episode() Single episode evaluation
evaluate_policy() Multi-episode evaluation
pso_optimize() Global optimization
tabu_search() Local refinement
hill_climbing() Final tuning
train_and_save() Full pipeline and save
load_policy() Load saved model
play_policy() Render policy in environment