GitHub - janhuenermann/minrl: Educational implementation of RL for LLMs in Pytorch

minrl

Reproduction of GRPO, intended to be educational and efficient:

On an RTX 3090, we start with a base model (Qwen3-1.7B-Base) and train on the countdown task, reaching 60% accuracy in under 3 hours.
We incorporate improvements from Dr. GRPO and DAPO, namely averaging over tokens rather than over batches and dropping reward scaling.
We use LoRA to fit model training into 24GB of GPU memory.

Setup

In order to set up a training environment, you need to install the dependencies, including flash-attn.

# Clone the repository
git clone git@github.com:janhuenermann/minrl.git && cd minrl

# It is recommended to create a new virtual environment
python -m venv .env
source .env/bin/activate

# Install package and core dependencies
pip install -e .

# Install flash-attn with ninja for faster build times (this may take a while)
pip install ninja
MAX_JOBS=4 pip install -v --no-build-isolation flash-attn

Training

To start training, you can simply run the scripts/train_grpo.py script, which will automatically download the required artifacts (dataset and base model) and start training.

python scripts/train_grpo.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
minrl		minrl
scripts		scripts
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

minrl

Setup

Training

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

janhuenermann/minrl

Folders and files

Latest commit

History

Repository files navigation

minrl

Setup

Training

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages