Code Repo

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models.

Clone

git clone --recurse-submodules git@github.com:thu-wyz/inference_scaling.git

This command will clone our repository with the sglang repository as a submodule. The sglang repository should be on the reward-model branch, which has been modified slightly by us to support our process reward model for efficient tree search. One can also use hf_score.py in the repo to score the steps of each solution. The benchmark datasets: MATH, GSM8K.

Install

In order to install SGLang and other dependencies:

cd sglang/python
pip install .
pip install outlines==0.0.44

One can also install SGLang through its official repo, but it may not support our process reward model, hence could only be used for sampling.

Finetune

Our finetuning code for policy models and reward models is based on gpt-accelera You can check the code in the finetune directory, we also provide huggingface finetune code for policy model. You can find the models on huggingface: Llemma-7b, Llemma-34b, Llemma reward model.

Launch Server

You can use tmux to start the servers, or run them in the background by adding & at the end of the scripts. Make sure to set the correct paths on your device.

bash ./scripts/run_policy.sh
bash ./scripts/run_reward.sh

Sampling Baseline

bash ./scripts/sgl_baseline.sh
bash ./scripts/hf_scores.sh

REBASE

Before starting the REBASE, set the hyperparameters in the YAML file. Then run:

bash ./scripts/rebase.sh

Evaluate

GSM8K https://huggingface.co/datasets/openai/gsm8k MATH500 https://github.com/openai/prm800k/tree/main/prm800k/math_splits/test.jsonl

You can select various aggregation functions for the scores at each step, such as last, mean, prod, or min. Additionally, you can modify the script to select answer based on best-of-n or weighted majority voting.

bash ./scripts/evaluate.sh

Citation

If you find our work helpful, please consider citing us:

@misc{wu2024inferencescalinglawsempirical,
  title={Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models}, 
  author={Yangzhen Wu and Zhiqing Sun and Shanda Li and Sean Welleck and Yiming Yang},
  year={2024},
  eprint={2408.00724},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2408.00724}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
evaluate		evaluate
finetune		finetune
hype-parameters		hype-parameters
scripts		scripts
sglang @ 9e9974d		sglang @ 9e9974d
.gitmodules		.gitmodules
README.md		README.md
hf_score.py		hf_score.py
math_evaluate.py		math_evaluate.py
rebase.py		rebase.py
sgl_baseline.py		sgl_baseline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Repo

Clone

Install

Finetune

Launch Server

Sampling Baseline

REBASE

Evaluate

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code Repo

Clone

Install

Finetune

Launch Server

Sampling Baseline

REBASE

Evaluate

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages