GitHub - cowolff-thesis/library: This repo contains the library that was developed as part of the Master Thesis of Cornelius Wolff

Cornelius Wolff Thesis Repository

Work still in progress.

About The Project

In this repository, I publish the code of my thesis.

Getting Started

To execute my experiments, the environment and my reinforcmenet learning code has to be installed as a package first. Clone this repository, navigate with your terminal into this repository and execute the following steps.

Installation

Install the repository as a pip package
```
pip install .
```
Check whether the installation was successful
```
python -c "import ThesisPackage"
```

(back to top)

Usage

The basic multi agent pong environment can be imported and trained like this:

from ThesisPackage.Environments.collectors.collectors_env_discrete_onehot import Collectors
from ThesisPackage.RL.Centralized_PPO.multi_ppo import PPO_Multi_Agent_Centralized

if __name__ == "__main__":
    num_envs = 64
    seed = 1
    total_timesteps = 6000000000
    
    sequence_length = 1:
    envs = [make_env(sequence_length) for i in range(num_envs)]

    agent = PPO_Multi_Agent_Centralized(envs, device="cpu")

    agent.train(total_timesteps, tensorboard_folder="OneHot", exp_name=f"collect_seq_{sequence_length}", anneal_lr=True, learning_rate=0.001, num_checkpoints=60)

    agent.save(f"models/collectors_seq_{sequence_length}")

If you want to train your setup as a self-play sender-receiver setup, you can do it like this:

from ThesisPackage.Environments.multi_pong_sender_receiver import PongEnvSenderReceiver
from ThesisPackage.RL.Decentralized_PPO.multi_ppo import PPO_Multi_Agent

def make_env(seed, vocab_size, sequence_length, max_episode_steps):
    env = PongEnvSenderReceiver(width=20, height=20, vocab_size=vocab_size,sequence_length=sequence_length, max_episode_steps=max_episode_steps, self_play=True, receiver="paddle_2", mute_method="zero")
    return env

if __name__ == "__main__":
    i = 4
    num_envs = 64
    seed = 1
    sequence_length = i
    vocab_size = 3
    max_episode_steps = 2048
    total_timesteps = 150000000
    envs = [make_env(seed, vocab_size, sequence_length, max_episode_steps) for i in range(num_envs)]
    agent = PPO_Multi_Agent(envs)
    agent.train(total_timesteps, exp_name="multi_pong_sender_receiver")
    agent.save(f"models/multi_pong_test_sender_receiver_{i}")

The output of this code will look like this:

(back to top)

Environment and Reward

Ball Movement and Wall Collision

The ball moves according to its current direction. If it hits the top or bottom wall, its vertical direction reverses, simulating a bounce.
When the ball hits the left wall, its horizontal direction reverses, making it bounce back into play.

Reward Calculation

Initially, all paddles have a reward of 0. This setup indicates that no paddle has earned a reward yet.
The environment checks if the ball is at the right edge and in line with any paddle. If a paddle successfully hits the ball (the ball's vertical position aligns with the paddle's position), that paddle receives a reward of 1, rewarding the paddle for hitting the ball back.
The rewards are calculated based on the ball's interaction with the paddles and the walls, focusing on rewarding paddles for successfully hitting the ball back into play.

(back to top)

Language channel

The language channel uses discrete language tokens. The size of the language channel and the number of vocabulary are handed over as hyperparameters to the pong environment.

(back to top)

Contact

Cornelius Wolff - cowolff@uos.de

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ThesisPackage		ThesisPackage
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cornelius Wolff Thesis Repository

Table of Contents

About The Project

Getting Started

Installation

Usage

Environment and Reward

Ball Movement and Wall Collision

Reward Calculation

Language channel

Contact

About

Uh oh!

Releases

Packages

Languages

License

cowolff-thesis/library

Folders and files

Latest commit

History

Repository files navigation

Cornelius Wolff Thesis Repository

Table of Contents

About The Project

Getting Started

Installation

Usage

Environment and Reward

Ball Movement and Wall Collision

Reward Calculation

Language channel

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages