Evaluating the Robustness of rStar

See project report

This repository contains the code and resources for the project "Evaluating the Robustness of rStar: A Novel Framework for Enhanced Reasoning in Small Language Models", conducted as part of the 02456 Deep Learning course at DTU Compute, Fall 2024. In the framework reinforcement strategies are applied where first a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM (discriminator), with capabilities similar to the target SLM verifies each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct.

Authors:

Jone Steinhoff (s243867)
Lukas Rasocha (s233498)
Panagiota Emmanouilidi (s223531)
Petr B. Nylander (s240466)
Robert Spralja (s243658)

Supervised by:

Prof. Ole Winther DTU Compute, Technical University of Denmark

Overview

This project focuses on evaluating the robustness of the rStar framework, a reasoning system for small language models (SLMs). The evaluation uses variations of the GSM8K dataset, highlighting the framework’s strengths and limitations in handling diverse input modifications.

For more information please refer to our report.

Reproduce

Prerequisites

Python 3.10 or later
CUDA-enabled GPU (e.g., NVIDIA Tesla A100)
Libraries listed in requirements.txt

Clone the repository:

git clone https://github.com/lukyrasocha/rStar.git
cd rStar

Install dependencies:

pip install -r requirements.txt

Run and inspect the main.ipynb notebook to see how results can be reproduced.

Name		Name	Last commit message	Last commit date
Latest commit History 192 Commits
GSM8_Symbolic		GSM8_Symbolic
Po_session		Po_session
Results		Results
assets		assets
baseline		baseline
common		common
data		data
eval_outputs/GSM8K/Mistral-7B-v0.1/test_0_29		eval_outputs/GSM8K/Mistral-7B-v0.1/test_0_29
eval_src		eval_src
job_scripts		job_scripts
models		models
prompts		prompts
report		report
run_outputs/GSM8K/Mistral-7B-v0.1		run_outputs/GSM8K/Mistral-7B-v0.1
run_src		run_src
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
change_Q_names.py		change_Q_names.py
eq_validator.py		eq_validator.py
export_baseline_questions.py		export_baseline_questions.py
main.ipynb		main.ipynb
requirements.txt		requirements.txt
variation_generator.py		variation_generator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating the Robustness of rStar

Authors:

Supervised by:

Overview

Reproduce

About

Uh oh!

Releases

Packages

Languages

License

Stonej29/rStar

Folders and files

Latest commit

History

Repository files navigation

Evaluating the Robustness of rStar

Authors:

Supervised by:

Overview

Reproduce

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages