Lunar Lander Policy Optimization

This project implements an optimization-based controller for the LunarLander environment using reinforcement‐learning–inspired black-box search techniques. The agent policy is represented as a linear mapping from observations to action logits and is optimized using:

Particle Swarm Optimization (PSO)
Tabu Search refinement
Hill Climbing fine-tuning

The implementation uses Gymnasium for environment simulation and NumPy for numerical computation.

1. Environment

The project uses the LunarLander-v3 environment from Gymnasium. The observation space has 8 dimensions and the action space has 4 discrete actions.

2. Policy Representation

The policy is a linear model:

action = argmax( observation · W + b )

where

W is an 8 × 4 weight matrix
b is a 4-dimensional bias vector
parameter vector size = 32 + 4 = 36

3. Core Components

Policy Evaluation

single episode evaluation
multi-episode averaged evaluation

Optimization Algorithms

Particle Swarm Optimization
Tabu Search refinement of PSO output
Hill Climbing final local search

Additional Features

Parameter saving and loading (.npy)
Render mode for visual evaluation
Command-line interface for training and playing

4. Requirements

Install required dependencies:

pip install gymnasium
pip install gymnasium[box2d]
pip install numpy

Note: Box2D support must be installed for LunarLander.

5. How to Run

Train Policy

This trains using PSO → Tabu Search → Hill Climbing and saves best policy.

python main.py --train

(optional arguments)

--population_size
--generations
--episodes
--filename

Example:

python main.py --train --population_size 60 --generations 200 --filename best.npy

Play / Render Best Policy

python main.py --play --filename best_policy.npy

This loads parameters and renders multiple episodes.

6. Files Produced

File	Description
`best_policy.npy`	Saved optimized weight vector

7. Project Structure

policy_action()      Linear policy
evaluate_episode()   Single episode evaluation
evaluate_policy()    Multi-episode evaluation
pso_optimize()       Global optimization
tabu_search()        Local refinement
hill_climbing()      Final tuning
train_and_save()     Full pipeline and save
load_policy()        Load saved model
play_policy()        Render policy in environment

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Readme.md		Readme.md
policy_2101.py		policy_2101.py
train_agent_2101.py		train_agent_2101.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lunar Lander Policy Optimization

1. Environment

2. Policy Representation

3. Core Components

Policy Evaluation

Optimization Algorithms

Additional Features

4. Requirements

5. How to Run

Train Policy

Play / Render Best Policy

6. Files Produced

7. Project Structure

About

Uh oh!

Releases

Packages

Languages

tejeshwar20/code_p

Folders and files

Latest commit

History

Repository files navigation

Lunar Lander Policy Optimization

1. Environment

2. Policy Representation

3. Core Components

Policy Evaluation

Optimization Algorithms

Additional Features

4. Requirements

5. How to Run

Train Policy

Play / Render Best Policy

6. Files Produced

7. Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages