Skip to content

Fine tune Model with RLHF to Improve Agent Behavior #204

@kardSIM

Description

@kardSIM

Hi @PWhiddy

I liked this project a lot , and as someone whose childhood was shaped by Pokémon, I find this work incredibly exciting!
While testing the pretrained model, I noticed that the agent rarely reaches Cerulean City and often gets stuck in confined areas like Pallet Town after a few steps.

I’ve been experimenting with adapting Reinforcement Learning with Human Feedback (RLHF) approache to fine-tune the model. The goal is to correct some of the model’s behaviors (e.g getting stuck in loops or confined spaces) by incorporating human feedback into the training process.

Here’s a high-level overview of the approach:

Enable db_path in the baseline to record random segments of gameplay.

Use NiceGUI to annotate these segments with human feedback.

Train a reward model using train_reward.py based on the annotated data.

Enable reward_path to allow the agent to rely on the human feedback-based reward model instead of the default state reward.

this is the changes i made : https://github.com/kardSIM/PokemonRedExperiments/tree/rlhf

I tried to run it myself but due to hardware limitations (with my 14-CPUs), I m not going anywhere.

I still don’t know what hyperparameters to experiment with. START_PROB = 0.00005 controls the number of samples to be recorded. Also, during fine-tuning, the learning rate is reduced to prevent the model from forgetting.

I’d love to get your feedback on this approach!, if you tried something similar, and if you find it prominent if so, i be happy to collaborate further and refine the implementation.

Looking forward to hearing from you

Best,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions